( 12) IM i:k\ vi ION \i K vnoN \*{ lii ismkd i ndk.r niK rvi KM < oopkk vhon i uk vi \ i ) 



(I*)) Worltl Intellectual I'lopcrlv Oiouni/atioii 

lnlL*riKilK>iUil Miiivaii 

(43^ InliMiuUioiKil Piihlieation Date 
18 JiiK 2(H»2 (18.07.2002) 



PC I 



(MO International Puhlication Nuiuber 

WO 02/055995 A2 



(51) liUirii:iti<Mi:il l*;ileiil ( LissilK:iti(m : <i(H\ 3/00 

(21) InicriKitioiuil Application Niimhei : P( l/l .S()2/(KH.M) 

(22) liitenutti'Hial Kiliii^ l);itc: H)l..iiui.n\ I^OO.": ( 1 0 () I 20(»: > 
(25) Kiliiit; I .;iiimi:ii;e: I nvli Ji 
(2()) Piihiic:iti<Mi L;iii<iu;t*4e: I riijii h 



(30) Priority l);ita: 



H) l.iim.iiv :()()! ( I OOl .7()()1 ^ I 

: Mjidi :()()] i : ^2001 ) 1 
10 i.iiui.iiA :()(): ( 10. 01. 21)0:) i 



("I) \pplii-;iiit (for ull t/V.v;;..77c//tc/ Su/fcs f S/: WW 

Pt NN SrVI K RKSKAK( M KOI NDAI ION [I S/rsI 
M) \ ( )k{ Main. I iiivcrlily Park. PA lf.S(>2 (I S). 

(72) lnvent(»rs: ;iihI 

("5) lii\eiU<>rs/Applic;>nts r'<'/ / SC/Zr/: M A K A N AS, C Ostas 

|l S/U.S|: o(M OM Mam. I'nivci.iiy Park. PA 1(>X()2(1'N) 
HI R(.ARI). Anthony, P |l S/l Sk ^o; OUi Mam. 
xcrsuy Park. I'A H.S()2 (liS). 



("4) ^lit-nt: NK.HKL, Heitli, S.. /m\c\. \UY.cc. ilu.fni.- 
Vooi liLL wV Sca^c. P L ( ..SuilL- >:()(). SO I (iianJ Avcinic. 

I K s Nk-inc-.. \ .\ -SO ^()^) 272 1 i ( :^ ) 

(SI ) i)csii:iiatecl States ( >i.:/ion,!/ >: Al \( k \i . \\k \k. \l . 
A/. PA. PP. P( H\<. H\ P/ ( A. ( i I C (OA R. ( P. 
( /. DP. I)K. I)\k 1)/. I r:. PS kk { IP. ( iP». { i p. ( ilk (iNP 
HP.. Ill: II). II . IN, IS. IP. KP. K(i. KP. KK. K/. \ i\ I K. 
PP. PS. PI I P. I \ . M.\. MIV M(i. MK. \IN. M\\. M\. 
M/.. M). N/. PP. PP kO. KP. SPi. si:. SCk SP SK. SI . 
I k I Nk I K. PI. I/. PA. 1(1. PS. P/. W. Mk/.\./\\ 

(N4) DcsiudaUnl States r/vy/. \KII'() palcfii ((Ilk (INP 
KP. I S MW M/. SI) sk, S/. )/. P(i. /.\P /A\ L 
Piira ;ia:i palcnl ( A,\P A/. \^\ K( i. K/. Ml ) PP I k kM ). 
PiMi.ixa i palciu ( \1. P1-. ( Ik ( A. 1)17 l >K. PS. I P I k. 
(ilP <ik li:. II. I P. M(7 NI . Vl. SP7 IP). OAIM paUni 
(PI . PI. ( I . ( (i. ( P CM. (iA. (iN. (i(^). C;\V. MP. Mk. 
NP. SN. i n. Kii 

Published: 

ii/u>n r. \ a I -I < >f thiii rcj 'ori 

I II/- /U (>-/«.'/7l /• coj'cs u/u/ oiIilT dhhrcYiiHh >tis. rch 'r /o iiir "( luul- 
i/fii.L \<>!c^ •'■n (7(\ /<_■:. tin, I \ i^hrcvuu i< >ns'\ii >( 'ClU'iH;..' iif llic hc;^ui- 
nin\i (>/ >--':^ii!ur :s.s>it.- of ilic /'( ICmzclw 



(54) Title: MIH 1 lOD AND SVS l iiM lOK MODIiPINCi C I PPPPAK MPriAHOPISM 



< 

ON 
ON 



KINlTIC/REGULATORY 
lOGIC CONSTRAIflTS 



DIFFERENTIAL 
m MICROARPAY 
COKSTRAiNT!i 




MIIP SOLUTIONS 






ItCHNiQULS 




OBJECT FUNCTION 








mPOTHESlS TESTING 










FLUX E 


AlANCE 


^ ^ 


GtNF ADDIIIONS 


mjSIS MODELS 




(UNIVERSAL MATRIX) 



GENE DELtllONS 
(MINIMAL StlS) 



(>7) Ahstnict: I his inxcnlion icialcs lo tiKitivKis atui sNsiciii-. lor in alien ur hininl\imiaiic iimJcIinu nl ccllDiai mclalinlisru. I hc 
in\L-nlion includes incllioj'; anJ sysioms lor moJclinn CL-IIular ine!abolisni olAu) ori:ainsni. coniptisjri;: consirnciiny a llnx liaI<iTRc 
analysis nioJck and apply iiii: const ranits to ihc Ihix balance ana]ysI^ nioJck ihc torisirainls sdcclcJ Itoni I lie sci ^ on ^is^nL' ol (.pial 
itative kinelic inrornialivm con .liainl-,. <juatilali\e rcijuIaU'rv iiilortnalion coiisirainls. aiul JilTercntial I )\7\ inicioarrav e\|UM unenlal 
Jala consltamls. In aJililiiin. ihc pie^enl uixeniioii pii>\ iJcs tor curnpuial lonal pioccvlures loi >ol\ iriu nielaPolic iMobienis. 



wo (I2/II55W5 



P( I7(1S(»2/(K»(,(>(I 



TITLE: METHOD AND SYSTEM FOR MODELING CELLUI.^R 
METABOLISM 

PRIORITY ST ATEMENT 

This api)Ucation claims priority to Provisional Patent Application No. 
60/l^G0,713 filed January 10, 2001 and Provisional Patent Application No. 
60/278,535 filed March 23, 2001, both of which are herein incorporated by 
reference in their entirety. 

FIELD OF THE im^^ ENTTON 

This invention relates to methods and systems for in silico or 
bioinformatic modeling of cellular metabolism. More specificallj', although not 
exclusively, this invention relates to a fmmework of models and methods that 
improve upon flux balance analysis (FBA) models through incorporation of 
particular constraints. These constraints incorporate, without Hmitation, 
qualitative kinetic information, qualitative regulatory information, and/or 
DNA microarray experimental data. Further, the present invention relates to 
solving various metabolic problems using particular computational procedures. 

BACKGRO UND OF THE, TN ArsNTION 

Metabolic pathway engineering has attracted significant interest in 
recent years catalyzed by the rapidly increasing number of sequenced 
microbial genes. As of January 2001, over fifty microbial genomes were 
completely sequenced. Bioinformatic tools have allowed the functional 
assignment of 45 to 80 % of their coding regions. E. Pennisi, Science 277, 1432 
(1997). This newly acquired information is used in conjunction with microbial 
mathematical models to calculate the response of metabolic networks after 
gene knockouts or additions. For example, such information was used to 
increase ethanol production in metabolically engineered E. eoli cells. V. 
Hatzimanikatis, et al., Biotechnol. Bioeng. 58, 154 (1998). 



1 



JH 171 S(I2/<MK>(>0 



In general, mathematical models of cellular metabolism lall into two 
dielinct categories, ones that incorporate kinetic and regulatory mibrination 
and others that include only the stoichiometry of the reaction pathways. The 
rst class of models matches cellular behavior at an original steady state and 
then employs kinetic and regulatory relations to examine how the cell behaves 
away from this steady state in the presence of small pevturbiitions brought 
about by environmental changes or enzyme engineering. The key advantage of 
this first class of methods is that upon application a unique point in the 
.L.etabolite £lux space is identified. The disadvantage is that the required 
kinetic parameters are difficult to estimate and their accuracy and 
] eproducibility may deteriorate rapidly as the system moves lar away fiom the 
/^iginal s teady-state . 

The so(!ond class of models, flux balance analyses, utilizes only the 
-ch-jmetric mass balances of the metabolic network and cellular 
'omt-' sition information, in the absence of detailed kinetic and thermodynamic 
data, to identify boundaries for the flux distributions available to the cell, 
Vlthough microorganisms have evolved highly complex control structures Th;u 
.ntually collapse these available boundaries mco single points, flux balance^ 
models ai-e still valuable m setting upper bounds for performance targets and 
m identifying ''ideaf flux distributions. 

However, the versatility efflux balance analysis comes at the expense of 
unknowingly crossing kinetic or regulatory flux barriers. Flux balance model 
/reflictions must thus be cautiously interpreted as ' ideal" flux distributions 

lelding upper bounds to the perlbrmance of the metabolic network. The key 
advantage of flux balance models is that, by not requiring any numerical 
hies for kinetic parameters or regulatory loops, they are straightforward to 
j/i-pile. The key disadvantage is that the obtained stoichiometric boundaries 
n bo very wide and it is hard to envision that the bicjnass maximization 
eonje^-.lure, while useful under certain conditions, is generally applicable. 

M IS, theiv^iore a primary object of the present invention to provicir^ a 
. liod and system that improves upon the state of the art. 



r( r/i'so2/(Mi(,6(» 



It IS a further object of the present invention to provide a method and 
system that provides a framework for improving upon flux balance analysis 
models. 

It is a still further object of the present invention to provide a met hod 
and system that allows the predictive capabihties of flux balance analysis 
models to be enhanced. 

Another object of the present invention is to provide a method and 
system that incorporates qualitative kinetic and/or regulatory information into 
a flux balance analysis model. 

Yet another object of the present invention is to provide a metliod and 
system that incorporates differential DNA microarray experimental data into 
a flux balance analysis model. 

A further object of the present invention is to provide an improved 
method and system for determining minimal reaction sets for growth. 

Another object of the present invention is to provide an improved 
method and system for determining the effect of environmental conditions on 
minimal reaction sets. 

It is another object of the present invention to provide a method for 
calculating the response of metabolic networks after gene knockouts or 
additions. 

A still further object of the present invention is to provide a method and 
system for selecting mathematically optimal genes for recombination. 

Another object of the present invention is to provide a method and 
system for identifying lethal gene deletions. 

Yet another object of the present invention is to provide a method and 
system for identifying gene therapeutic candidates for pathogenic microbes. 

A still further object of the present invention is to provide a method and 
system capable of testing hypotheses or objective functions. 

These and other objects, features and/or advantages of the present 
invention will become apparent £i-om the specification and claims. 



!*( r I so: iMM>M) 



S UMM ARY QF THE I NVEN TION 

This invention includes a framewoi^k for tu'lico or bioinfonnatic 
modeling of cellular metabolism. The framework allows for an improvemenr to 
[' BA models through incorporation ot particular constraints. Preferably, these 
•-cnstraints are logic constraints that can bo represented wath binary variables. 
The framework provides for applying computational procedures m order to 
solve for model predictions. The model can be used to determine: hov; many 
•:ind which foreign genes should be recombmed into an existing metabolic 
1 etwork; which regulatory loops should be activated or inactivated so that a 
given metabolic target is optimized; how robust is a metabolic network to gene 
deletion; what is the mathematically minimal set of genes capable of meethig 
"■-rrtain growth demands for a given uptake environment; whether 
xpermiontal flux data, under different substrates and carbon/oxygen uptake 
rates, are consistent with different hypothesized objective functions; and other 
metabolic problems. The results obtained from use of this framework can be 
applied in a number of areas of research or commev- al interest related to 
metabolic engineering, including areas in iho biolo;,dcal, chemical, 
j^harmaceutical, life sciences, and medical fields. 

BRIEF DESCRIPTION OF THE DRAW TNGS 

Figure 1 is a block diagram showing an overview of the present 
invention. 

Figure 2 is a diagiam of multipk^ objective function slop(\s ccmsistent 
vvU-h the same optimum point. 

Figure 3 is a set of feasible objectives lor different conditions. 

Figure 4 is a pictorial representation of stoichiometric boundaries, 
kinetic/regulatory barriers and a new optimal steady state. 

Figure 5 is a dingram of a simple network showing the applicaiion of 
logic constraints. 

Figure G is a diagram of two i^aris of a nieial)o]i(: network where 
bottlenecks are identified. 



\\ () <>2/055*>*>5 



10 



15 



20 



Figure 7 is a logarithmic plot of probability of flux/transcript ratio 
agreement versus transcript ratio. 

Figure 8 is a plot of mimmum acetate uptake rate versus a for a 0.3 hi 
growth rate. 

Figure 9 is a table of model predictions for maximum theoretical > ields 
of seven amino acids for growth on glucose ;ind acetate. 

Figure 10 is a diagram showing the pathway modifications introduced i 
a recombmed network for growth on glucose. Figure 10 shows the difference 
between optimal E.coli and Universal arginme production pathways for 
growth on glucose, including (a) the pyrophosphate dependent analog of 6- 
phosphofructokinase in the Universal model replacing the ATP dependent 
version present in E. coli, and (h) carbamate kinase in the Universal model 
replacing carbamoyl phosphate synthetase from the E. coli network. 

Figure 11 IS a graph showing the size of minimal reaction networks as a 
function of imposed growth rate for (a) growth on only glucose and (b) growth 
on a medium allowing for the uptake of any organic compound with a 
corresponding transport reaction. 

Figure 12 is a table showing modifications to the Pramanik and 
Keasling model. 

Figure 13 is a graph showing gene knockouts at various biomass 
production levels for growth on glucose. 

Figure 14 is a table showing genes selected for removal by knockout 

study. 

Figure 15 is a table showing model selections of enzymatic reactions 
that will enhance the amino acid production capabilities of E. coli. 

Figure 16 illustrates optimal E. coli and Universal arginine production 
pathways for growth on glucose. The utilization of carbamate kinase and the 
pyrophosphate dependent analog of 6-phosphofructokinase by the Universal 
argimne production pathway preserves a net of 3 ATP phosphoanhydride 



30 bonds. 



wo 02 (»>5»>95 



Fi^:aro 17 ilkLstrates optimal coll and Universal argnune production 
pathways for yrowt h on acetate. The mcorporation of carbamate kinase and 
the pyrophosphate dependent analog of acet:ate kinase by the Universal 
pathway saves 3 ATP phosphoanhydride bonds. 

Figure 18 illustrates optimal asparagine production pathways for two 
modes of glucose utilization: glucokinase and the phosi:)hotransferase system. 

Figure 19 illustrates an optimal Universal asparagine production 
pathway for growth on glucose, the Universal pathw^ay conserves the 
equivalent of 1 ATP bond by using an ADP-forming aspartate-ammonia hgase 
instead of an .AMP-forming version as shown in the previous figure. 

Figure 20 illustrates optimal E, coli and Universal histidine production 
pathways for growth on acetate. Both the energy efficiency (2 ATP s) and 
carbon conversion efficiency of the Universal patlnvay are improved by the 
incorporation of a pyrophosphate dependent analog of PE]P carboxykmase and 
glycine deh\'di"ogenase, respectively. 

Figure '1\ is a graph of a number of reactions in each min mal set as a 
fund ion of the imposed growth demands for a glucose or acetate-only uptake 
en\'ironment. 

Figure 22 is a table showing evolution of minimal reaction sets under 
decreasing growth conditions. 

Figure 23 is a table show^ing metabolites uptaken or secreted at each 
target growth rate on an optimally engineered medium. 

Figures 24 and 25 are graphs of a uuml)er of reactions in each minimal 
set as a function of the imposed grov.ah demands for an uptake environments 
allowmg multiple organic uptakes. 

Figure 26 is a table showing evolution of minimal reaction sets for a 
second set under decreasing grow^th requirements. 

Figure 27 is a table showing functional classification of minimal 
network reactions for growth on an optimally engineered medium. 

Figure l^S is a table showing a compaiison of minimal metabolic 
gene/reaction sets based on functional classification. 

() 



W () 0 2 <>55*)«)5 



DE TAILED DESCKIPTION OF TH E TN\1^]NT[ O N 
1. OVERVIEW 

Figure 1 illustrates the framework of the present, invention. This 
framework improves upon flux balance analysis (FBA) models through 
mcori)oration of particular constraints. Those constraints incorporate, without 
limitation, qualitative kinetic information, qualitative regulatory information, 
and/or DNA microarray experimental data. Preferably, these constraints arc^ 
k)gic constraints that can be represented with binary variables. The invention 

10 also provides for including computation procedures sucli as mixed-integer 

luiear programming into the framework in order to use the model to arrive at a 
solution. As shown m Figure 1, the model provides for determining metabolic 
l)erformance/robustness in the face of gene additions or deletions. In addition 
the model provides for testing whether experimental flux data, under different 

15 substrates and carbon/oxygen uptake rates are consistent with different 
hypothesized objective functions. 

The present invention involves a process for tightening the flux 
boundaries derived through flux balance models and subsequently probing the 
perfor mance limits of metabolic networks in the presence of gene additions or 

20 deletions. Given the large number of genes (hundreds to thousands) available 
for recombination, present optimization formulations reach and sometimes 
exceed the limit of what can be solved with state of the art mixed-integer 
linear programming solvers. The present invention meets the dual objectives 
of constructing modeling formulations that enable an effective query of the 

2:> performance limits of metabolic networks and provide customized techniques 
for solving the resulting mixed-integer linear programming problems. 



OBJECTIVE FUNCTION HYPOTHESIS TESTING 
The present invention provides for an unbiased, mathematically 
30 rigorous framework for testing whether experimental flux data, under 
different substrates and carbon/oxygen uptake rates, are consistent with 

7 



P( r I so: ooooo 



ailibrcnt hypothesized ohjectjve lunctions. A. Wirina and B. C). Palsson, 
«i()/Tec:hnoloo;y 12, 994 (1994); R. A. Majew^ki and M. M. Domach, BiolechnoL 
Bioeng. 35, 732 (1990). Rather than starring by postuhiting such an objective 
fLiuction, or even accepting that there exists an objective function go\'ernuig 
cellular behavior, the quantitative framework of the present invention is based 
on inverse optimization that enables researchers to test, disprove or fine tune 
the consistency of different hypotheses. Note that while one can never prove 
Lhe existence of such an objective function, the framework is useful for 
rigorously testing whether experimental data is consistent or inconsistent with 

10 a postulated objective function and how this may change under different 
environmental conditions . 

Inverse optimization concepts that were pioneered in geophysics for the 
identification of model parameters for systems reaching optiniality given a set 
of observablcs are applied here. Specificalhs the present invention provides for 

15 finding the coefiicjents Cj in a hypothesized linear objc^ctive function V . qvj 

1 hat are consistent with the .mbset of observed fluxes v\ (e.e 

substrate/oxygen uptakes, growth rate, etc.). In general, not single but rather 
:i range of values for the coefficients q are consistent with a set of observed 
fluxes. This is illustrated with Figure 2A in two dimensions. 

-^^ Aw objective function cwi -f C2vi: whose slope (-C2/ci) is betw^een values 

a and b is consistent with the optimahty of point A. This gives rise to the 
range of values for ci and c- denoted by the line segment between points B and 
C shown in Figure 2B that are consistent with the optimality of point A. Note 
that ci and C2 were scaled so that ci -V cj ~ 1. In the general // dimensional case, 

25 the set of c/ values in compliance with an optimum v] forms a polytope. 

The general problem is addressed using the ideas introduced by Ahuja 
and Orlin (200 J). R. K. Ahuja, et oL. N etwork Flows. Theorv. Algorithms, anrl 
Applications, Prentice Hall, Englewood Chffs, N.J. 1993, Given an observed 
s ubset of fluxes t * the set of objective function coefficients Cj can be 

30 determined by finding all multiple optimal solutions of the restricted dual 

s 



N\ () (>2/(»55'>'>5 



IH l/l S((2/(>0(,(,() 



feasibility problem solved in the space of dual variables a, and the linear 
objective function coefficients cj. 

The dual variables ai quantify the relative importance of a motabttlac^ i 
towards improving the objective function. The solution of tlie restricted clu:d 
problem systematically characterizes the set of all possible c, v.-dues consistent 
with a subset of observed fluxes v] . These alternate optimal solutions can be 
obtained as a byproduct of the simplex method since any basic feasible solution 
from the simplex tableau defines a vertex of the polytope formed in the Cj 
space. An alternate method using integer cuts can also be employed. S. Leo, ct 
al, Comput. Chem. Eng. 24, 711 (2000). The present invention contoinplates 
that with these techniques a determination can be made as to whether the 
polytopes overlap considerably (see Figure 3A) or migrate systematically (see 
Figure 3B) as the as the substrate choice or uptake rate of carbon/oxygen 
changes. This set of quantitative tools provides an unbiased framework for 
researchers to test the range of validity (if any) of different hypotheses. 

3. KINETIC/REGULATORY LOGIC CONSTRAINTS 

Flux balance models, by relying solely on stoichiometric balances and 
uptake rates, are guaranteed not to exclude any feasible flux distributions. 
However, this versatihty may lead to overly optimistic expectations if the 
results are not interpreted properly. The flux distributions within the cell are 
ultimately uniquely determined by the regulatory mechanisms within the cell, 
the kinetic characteristics of cellular enzymes, and the expression of these 
enzymes. Assuming cells operate in a stoichiometrically optimal fashion may 
yield metabolic flux distributions not available to the cell. The present 
invention provides for multiple methods for tightening the predicted 
stoichiometric flux boundaries by FBA models. A first strategy involves 
attempting to ensure that flux changes identified through FBA are consistent, 
in a qualitative sense, with the kinetics and regulatory loops of the metabohc 
network. By uncovering unreachable domains within the stoichiometric flux 
boundaries the predictive capabihties are improved. A second strategy entails 



wo 02, ()55V<)5 



iiuorporating experimentally obtained data nito the FBA model. The prespni 
mvention includes a mathematically sound framework for supenmi)osiiio [)NA 
array differential expression data into FBA models. 



3.1 Kinetic and Regulatory Loop Consistency 

The key question addressed here is whether the optimal flux 
distributions predicted by the FBA models are reachable by the cell or whecher 
kinetic and/or regulatory boundaries will prohibit the system from reaching 
tlie stoichiometric boundaries (see Figure 4). 

The key idea we propose to explore is to ensure, by using logic relations, 
that when, in response to environmental changes, the metabolic network shifts 
from one steady-state to another, up or dow^n changes in metabolite 
concentrations are consistent with up or down changes in reaction fluxes. 

Specifically (see Figure 5), flux v can increase, m the absence of enzyme 
Ci.gineering, only if the concentration Cl of reaotant A or t he concentratiDn CD 
of activator D increase or the concentration Ci^.' of inhibitor E decreases. 
^ 'early, changes in the reaction fluxes and metabolite concentrations are 
coupled and even in the absence of detailed quantitative kinetic/regulator\ 
information binding relations can he derived based on the direction of t hese 
-hanges. One such set of relations is described in detail below. 

Specifically^, for any reaction flux vj to increase above an initial base case 
value y'. , either the concentration of a reactant must increase, or the 
concentration of an activator must increase, or the concentration of an 
inhibitor must decrease and vice versa. Incorporating these logic constraints 
nito the FBA framework, requires first a regulation matrix F to be established 
describing the effect of metabolite i on reaction 7. 

1 if metabolite / activates reaction j 
~l if metabolite/ inhibits reaction / 
0 if metabolite / has no effect on reaction / 

Such a regulation matrix c^n be constructed based on information iioin 
the EcoCyc and MetaC3x databases. P. D. Karp, el al, Nucleic Acids Res 28, 

10 



wo (J:/055*>«>5 



r( i/i so2/oo(»(,o 



55 (2000). Additkmal database resources exist also for non-E. coll reactions. 

iM. Kanehisa and S. Goto, S., Nucleic Acids Res. 28, 29 (2000). Two sets of 0-1 

variables Xi and Zj iivc introduced to track up or down movements in meiaboliie 

concentrations and reaction lluxc^s respectively. 

I lit the concenUation of metabolite / rises 

\ 0 otherwise 

I 1 if reaction flux / increases above original steady - state \'alLie 

^ [ 0 otherwise 

By utilizing these 0-1 variables, wo incorporate the following logic 
constr.aints into the FBA model for safeguarding against the violation of some 
of the kinetic and regulatory barriers. 

-0---;)vr+v;<v^.<v;4-vr.. (1) 

X(l--v,)+Xx,(l-A-J+ Z^.^^-^j. V/ (3) 

Relation (1) ensui^es that Vj > v"j when r; = 1 as well as vj < v"". when Zj = 

0. Constraint (2) ensures that the concentration of a reactant must increase, 
the concentration of an activator must increase, or the concentration of an 
inhibitor must decrease for a reaction flux vj to increase above its initial base 
15 case value v° . The last constraint (3) ensures that the concentration of a 

reactant must decrease, the concentration of an activator must decrease, or the 
concentration of an inhibitor must increase for a reaction flux vj to decrease 
below the initial base case value. Revisiting the example of Figure 3 
constraints (2) and (3) for flux i; yield 

20 + X,, + (l - X, ) > 5, , and (1 - x^ )+ (l - x,, )+ x, > 1 ~ 

Preliminar3^ work on the alanine overproduction pathway for growth on 
glucose identified kinetic and regulatory bottlenecks that were not dctectal^le 
by simple FBA models. 

The first step in this anal^^sis was to obtain the initial base case values 

25 for the reaction fluxes. These were obtained by solving the LP problem for 

11 



l*( r rS02 «MK»oO 



) riiAimuin biomnss Ibrniation. The second sLo]) was to solve a second LP 
problem constraining the biomass production to 80% of its optimal value and 
allowing for the overproduction of alanine. The third step involved resolving 
second step scenario with the incorporaviou of the kinetic and regulatory 
5 logic constraints described above. This study revealed that the overproduction 
cf alanine (2.688 mmol/10 mmol GLC) subject to regulation is about 20% less 
tlian the value predicted by the FBA model (3.298 mmol/10 mmol GLC) 
without the logic based regulatory constraints. More important than being 
able to identify this reduction is the capability to pinpoint specific flux 
10 bottlenecks. Analysis of the reaction fluxes revealed two potential bottlenecks 
limiting the performance of the network (sec; Figure 6). 

The first bottleneck (Fig. 6A) arises because in addition to the pentose 
phosphate pathway reactions, ribidose-5~phosphate (RL5P) is also a precursor 
io fyposaccharide (LPS) which is a component of biomass. Under less than 
i - optimal growth demands, the reactioi flux from RL5P to biomass must 
decrease below its base case value. Thus the concentration of RL5P must 
decrease (only regulator). Therefo)-e, the flu.K through ribulose phosphate 3- 
er.jimerase cannot increase above its base case value because the concentration 
of the reactant RL5P is decreasing. This diverts additional flux through the 
20 ribose-5-phosphate isomerase reaction. The second bottleneck (Fig. 6B) occurs 
because during alanine o\TU'production, more flux must pass through pyruvate 
kinase than under maximum growth conditions. In this study, at the base 
case, the FBA model chose pyruvate kinase 11 which is one of the two 
soenzymes of pyruvate kinase. Hovv'ever, the flux through pyruvate kinase II 
L^ ( .;nnot increase above its base case value because the concentration of both its 
activator (AMP) and its reactants are decreasing. The FBA model including 
regulation partially circumvented this barrier by increasing the llmx through 
) '-uvate kinase I since the concentration of an activator (FDP) of this reaction 
IS increasing. This example suggests that the logic constraints, by capturing 
>ri some kinetic and regulatory information, are capable of iderjtifymg at least 

some of the bottlenecks undet(^ctable by simple FBA models without excluding 

12 



\\ () 02/055*)*>5 



any feasible flux distributions. Identifying these key fluxes as described above? 
and then engineering the enzymes and re^^ulation around them provides a 
straig'htfoi*\vard debottlenec^king strato<o-. The present invention conLeinplater- 
that one skilled m the art and havini^ the l)enef]t of this disclosure can 
const ruct addit ional logic constraints in the spirit of the ones described al)ove 
to further "tighten" the predictions of flux balance models. 



4. DIFFERENTIAL DNA MICROARRAY CONSTRAINTS 

In addition to using qualitative kinetic and/or qualitative regulatory 

10 information to define logic constraints for enhancing the predictive capabilities 
efflux balance models, the present invention provides for defining constraints 
based on experimental differential DNA microarray data. The recent 
development of DNA microarray technology has started to revolutionize the 
investigation of cellular global regulation on the whole genome scale. DNA 

15 microarrays enable the determination of differential transcription profiles, 
consisting of the relative expression levels of individual genes under various 
experimental conditions. This allows one to infer which genes are up- 
regulated or down-regulated as an organism responds to external 
environmental changes. Already such studies have been initiated for S. 

20 cerevisiae (L. Wodicka, et al, Nat Biotechnol. 15(13), 1359 (1997)) and E. coli, 
C. S, Richmond, et a/., Nucleic Acids Res. 27(19), 3821 (1999). The output of 
such experiments is typically a set of gene transcript levels normalized with 
respect to an original steady-state. For example, the differential transcript 
levels of 111 genes, involved in central metabolism and key biosjmthevses, have 

25 been measured for an E. coli strain grown on either a glycerol or acetate 
medium relative to a glucose reference condition. M. K. Oh & J. C. Liao, 
Biotechnol. Prog. 16(2), 278 (2000). Thus, a transcript level of 1.5 for a gene in 
the E. coli strain grown on acetate indicates that this gene is up-regulated by 
50% during growth on acetate as compared to growth on glucose. Although 

30 this methodology' cannot detect any translational or post-translational genetic 



13 



P( I 1 S(I2 IHK>(,0 



I :4ulation, with a few exceptions, the transcriptional regulation is the main 
mode of reguhition at leai^t in E. colL 

The kej' challenj^e is that at present transcript levels cannot bo used to 
;» ;er quantitative changes m the corresponding flux levels. Instead, at best 

y a qualitative statistical correlation between changes in tluxes and 
transcript levels can be drawn. Based on a qualitative linking between fluxes 
c;nd transcript levels, the present invention uses 0-1 variables to capture these 
• .ids. Let T^ denote the normalized transcript level of gene coding for 

azyme catalyzing reaction j upon the environmental change L A value 

greater than one implies overexpression while a value less than one denotes 

underexpression. For the sake of simplicity of presentation, a one to one 

i:*; ipping of genes to reactions ,; - A- is assumed here. This can be easily 

.:i xed if necessary. Consider binary variable Wj defined as 

f 1 , if the transcript and flux level changes are in the same du'ectioii 
vv. < 

[ 0, othciwise 

Given the definition of binary variables Wj, we can then write 

v:>v;-(i-.r;V;, ifr; >i (4) 
v;<v;+(i~.r;)()^p-v;) if r; <i (5) 

where v'j is the base flux level and v^' a maximum allowable value, t or wj ~ 

*bese two constraints correctly enforce v) > i if T^. > 1 and v) < v^^ if 
< 1 respectively. For Wj = 0 the two constraints yield obviously valid, non- 
buidmg constraints > 0 and u'^ < u""' respectively. Perfeci correlation 
.veen transcript and flux levels would have implied that ail Wj are equal to 
.\ However, experimental studies have demonstrated that not 100% but 
rather on average about 80% of the genes exhibit transcript and flux levels 
(•hanging m the same direction. Moreover, the further from unity the vrdue of 
• be transcript level is, the more likely it is for it ro agree w^ith the flux change 
<'ircction. This motivates a probabilistic description for quantif^-ing the 
ikelihood that transcript changes translate to corresponding flux 'evel 



wo 02/055*>*>5 



K r/lisa2/o()6(,(» 



changes in the same direction. Specifically, we will construct a statistical 
model of the form, 



The scale T^^-^^^' is chosen so as to control the I'ange over which Pj 
remains away from one. A value of 0.622 imphes that 100% overexpression (7) 
- 2) or underexpression (Tj ~ Yi) confers a 90% probability of unidirectionality. 
Figure 7 plots the probabihty Pj of having unidirectional transcript and flux 
changes as a function of transcript level Tj, Note that for Tj ^ 1, Pj ~ 0.5 
reflecting an equal chance for either outcome whereas when Tj has very large 
positive or negative exponents Pj approaches one. The present mvention 
contemplates that more elaborate models for Pj can be used, mcluding those 
constructed by borrowing from mechanistic methods or other methods 
developed for linking transcript ratios to flux changes. 

After using the Pj probabilities to weigh the effect of each lajtYie 
bllownig constraint is obtained: 



Here a is the fraction of genes j expected to have unidirectional 
transcript and flux changes. Thus, a quantifies the "agreeability" between the 
transcript ratios and the flux changes predicted by FBA. Augmenting FBA 
with constraints (4), (5) and (6) superimposes in a probabihstic sense the 
qualitative information encoded in the gene expression profiles of DNA 
microarray experiments. The above described probabilistic framework in two 
ways is employed in a number of different ways. 

The optimization of FBA models tj^Dically yields numerous alternate 
optima. An elegant algorithm has been proposed for identifying afl of them by 
Lee et oi., 2000. The DNA microarray data can be used to identify the subset 
of alternate optima that are consistent with the experimentally determined 
genetic expression levels. Specifically, the parameter a can be used to rank 
the multiple optima (Lee et al, 2000) topically obtained after the optimization 




,forr <I 



(6) 



cfaii objective function within an FBA model with respect to their ai^recabihi v 
with tlie DNA array data. Results with the data from Oh k Liao rZOOO) for the 
transition from growth on glucose to growth on acetate show that a can var^' 
from 0.74 to 0.89 for the niiposed growth rate of 0.3 hr-^ dependmg on which 
alternate optimal solution is identified by the solver. Thus the FBA to 
expression profile agreeability can be improved as much as 20*36 by maximizing 
u for a given FBA optimal solution. The present invention also provides for 
the direct incorporation of the DNA microarray data into the FBA model. Here 
the sensitivity of the FBA objective to the imposed agreeability with the 
experimental transcript profiles can be adjusted by constraining the model to 
raeet various values of a. Results, shown in Figure 8, for the same data from 
Oh & Liao (2000) show a quadratic trend between the minimum acetate 
uptake rate and the imposed agreeability parameter, a, 

5. IDENTIFYrNG GENE CANDIDATES FOR RECOMBINAllON 

The explosive growth of annotated genes as^;ociated with metabolism 
calls for a systematic procedure for determining the most promising 
recombination choices. Until now, recombinant DNA technology has been used 
to add straightforward conversion pathways which introduce new and 
desirable cellular functions. Here the objective is to utilize flux balance 
analysis and niLxed-integer progi'amming tools to select the mathematically 
optimal genes for recombination into E. coli or other prokaryotes from a 
metabolic database encompassing many genes from multiple species. The 
resulting pathv/ays need not lie directly on main production pathways, as they 
may enhance production indirectly by either redirecting metabolic fluxes into 
the production pathways or by increasing the energ\' efficiency of the present 
pathways. 

A comprehensive stoichiometric matrix containing all known metabolic 
reactions from the Kyoto Encyclopedia of Genes and Genomes (KEGG) 
(Kanehisa and Goto, 2000) and Ecocyc (Karp ct al, 2000), and other sources 
can be compiled and incorporated into the flux balance model of the model 

16 



rc r/rso2/o(K»(»(> 



organism (e.g., E, coli). We refer to this miilti-spocies stoichiometric matrix as 
the Universal stoichiometric matrix. This muki-species stoichiometric matrix 
is a vahxable resource for exploring in silica gone recombination alternatives 
and examining which prokaryote will be the most advantageous choice for a 

5 given bioprocessmg application. 

Selecting up to h new genes to recombine into the host organism so that 
a metabolic objective v^^ is maximized can be formulated as an MILP problem. 
This is accomplished by augmenting the LP flux balance model with constraint 
yk = 7, V A' e E that ensures that all E. coli genes are present as w^ell as 

10 constraints 



that allow up to h foreign genes to l^e incorporated in E. coli out of the 
comprehensive list contained in the Universal matrix (i.e.. NE). Here the host 
organism is assumed to be E. coli but in general any annotated prokaryotic 

15 microbe can be selected as the host organism. Reactions chosen by the model 
but absent in E. coli (i.e., all non-zero yk elements of NE provide routes for 
manipulating the cellular metabolism through recombinant DNA technology . 

Prelimmary results using the flux balance E, coli model of Pramanik 
and Keasling demonstrate that improvements to seven amino acid production 

?0 pathways ofE, coli are theoretically attainable with the addition of genes from 
foreign organisms (see the table of Figure 9). J. Praminik and J. D. Keasling, 
Biotech. Bioeng. 56, 398 (1997). 

In most cases, onl}^ one or two genes were added to the original amino 
acid production patlway even though the complete list of 3,400 reactions was 

25 available for selection. The mechanism of all identified enhancements is either 
by: (i) improving the energy efficiency and/or (ii) increasing the carbon 
conversion efficiency of the production route. Manipulation of the arginine 
pathway showed the most promise with 8.75% and 9.05% improvements for 
growth on glucose and acetate, respectively. >^igurc 10 shows the pathway 

30 modifications introduced m the recombined network for growth on glucose. 



V V, </ko<v^<v; 



17 



\\ () ()2/055«>«>5 



PC 171 S02/00660 



Overall, the additional genes used bj'' the Universal model save the orii^inal 
fvathway three net ATP bonds increasing arginine production by 8.75%. 
Similar trends are revealed when other native and Universal amino acid 
production routes for glucose and acetate substrates are examined. 
5 The models of the present invention that have been described can also 

be extended to encompass more gene candidates for recombination as they 
become available through ongoing genome projects. The present invention 
applies to any number of organisms, including other microorganisms of 
industrial significance. Even though E. coli is one of the most industrially 

10 significant microorganisms, other microbes confer advantages due to their 
relaxed regulatory mechanisms. For example, various species of the genera 
Corynehacteriiiin and Brevibacterium have been emploj'ed to produce 
glutamate by exploiting a phospholipid-deficient cytoplasmic membrane 
v:nabling tlie secretion of glutamate into the medium. Riboflavin, or vitamin 

::. overpr(:»ducG'i include Eremothecium ashbyii 'c\n& Ashbya gossypii in which 

no repressive ef%cts from ferrous ion are observed. 

The logic based constraints of the present invention can be integrated 
.v'ith the gene selection MILP formulation to tighten the obtained predictions. 
By contrasting the optimal recombination changes identified for the production 

20 of different ain^io acids, recombination strategies that point towards 

smiultaneous yield miprovements of multiple amino acids are identified. The 
mventioii's optimization framework for guiding gene additions provides the 
quantitatu e means to study flux enhancements through foreign gene 
J e(.ombinalion from an ever-expanding database of available genes. iVlthough 

25 • omplete gene-enzyme relationships are not currently known, the formulation 
allows the incorporation of this information as it becomes available. 

GENE DELETIONS (MINIAML SETS) 

The recent explosion of fully sequenced genomes has brought significant 
30 • tXention to the question of how many genes are necessary for sustaining 
cellular life. A minimal genome is generally defined as the smallest set of 

18 



wo 02/(>55*><>5 



IH 171 S<>2/(ilK><>f» 



genes that allows for replication and growth in a particular environment. 
Attempts to uncover this minimal gene set have included both experimental 
and theoretical approaches. Theoretical methods are based on the hypothesis 
that genes conserved across large evolutionary boundaries are vital to celluhir 
survival. Based on this hypothesis, a minnnal set 01250 genes was compiled 
by assuming that genes common to both M. genitaliuni and Ilaeinopliilus 
influenzas must be members of a minimal genome. A. R, Mushegian and E. V. 
Koonin, P. Natl. Acad. Sci. USA 93, 1026 (199G). 

Interestingly, however, only 6 out of 26 E, coli open reading frames of 
unknow^n function conserved in M, genitaliuni were deemed essential to 
species survival. F. Arigoi, et a/., Nat. Biotechnol. 16, 851 (1998). The 
existence of multiple, quite different, species and environment specific minimal 
genomes has long been speculated. M. Huynen, M., Trends Genet. 16, 116 
(2000). The present invention provides for a computational procedure for 
testing this claim by estimating the minimum life-sustaining core of metabolic 
reactions required for given growth rates under different uptake conditions. 
This problem can be formulated as the following optimization problem 

A/ M 

"^i^^ Z^^'/ subject to J] Sjj.v^. /: = !,. ..,iV 

with V. . > v'""^^' and 0 < v . < v"''" v 
that solves for the smallest set of metabolic reactions that satisfies the 
stoichiometric constraints and meets a biomass target production rate u^/L • 
Alternatively, instead of a biomass target, minimum levels of ATP production 
or lowest allowable levels of key components/metabolites could be incorporated 
in the model. One novel feature of this aspect of the invention is that whereas 
previous attempts utihzed reductionist methodologies to extract the set of 
essential genes through a series of gene knock-outs, here we simultaneously 
assess the effect of all reactions on biomass production and select the minimal 
set that meets a given growth rate target (whole-system approach). A minimal 
gene set can then be inferred by mapping the enzyme(s) catalyzing these 
reactions to the corresponding coding genes. 

19 



\\ <) 02,055*)*>5 



P( 171 S02/(l(U»(>0 



Results based on the E. coli FBA model of Edwards and Palsson for the 
hrst tune quantitatively deaionstrated that minimal reaction sets and thus 
corresponding minimal gene sets are strongly dependent on the uptake 
opportunities afforded by the growth medium and the imposed growth 
requirements. J, S. Edwards and B. O. Palsson, Proc. Natl. Acad. Sci. USA 97. 
5528 (2000). Specifically, the minimal reaction netw^ork (subset of only E. coli- 
i » .ctions), was explored for different growth requirements under two 
contrasting uptake environments (a) restricting the uptake of organic material 
t; > glucose only and (b) allowing the uptake of any organic metabolite wdth a 
corresponding transport reaction. These two extreme uptake scenarios were 
' hosen to model maximum and minimum reliance on internal metabolism for 
component synthesis and probe its effect on the minimum reaction set 
required. The minimum number of met abolic reactions as a function of the 
'mposed biomass growth target, (as a % of the theoretically maximum), for the 
' vo uptake choices is show^n m Figure 11. 

While it is predicted that an E, coli cell growai on a m ?dium containing 
only glucose requires at least 226 metabolic reactions to support growth, a cell 
c'ltured on a rich optimally engineered medium could support growth with as 
ew as 124 metabolic reactions. As expected, the minimal reaction set becomes 
larger by increasing the required grow^th rate. However, the magnitude of this 
increase is quite different for the two cases. In case (a) the minimal reaction 
set increases only from 226 to 236 to meet the maximum growth rate, however, 
'P case (b) the minimal reaction set almost doubles going from 121 to 203. 
Lirthermore, neither the minimal reaction sots nor their corresponding 
reaction fluxes were found to be unique. Even after excluding cycles and 
isoenz\mies hundreds of multiple minimal sots w^ere identified providing a 
computational confirmation of the astounding redundancy and ilux redirection 
versatility of the E. coli network. More importantly for case (a), all minimal 
reactions sets idonrified included 11 out of 12 reactions whose corresponding 
p;ene deletions were determined experimentally to be lethal for grovnh on 
glucose. Earlier analyses (Edwards and Pals^.on, 2000) based on a single gene 



\\ () 02/<l55*)«>5 



P( I /l S(»2/(Hk»(>0 



deletions conducted with this model using LP optimization were able to 
identify only 7 out of 12 lethal gene deletions motivating the importance of 
considering simultaneous gene deletions within an MILP framework. 

The present invention contemplates that this framework can bo built on 
by constructing different minimal reaction sots for not just E. coli but other 
species separated by wide evolutionary boundaries. By contrasting the 
obtained minimal sets, a comparison of minimal reaction sets (metabolic gene 
sets) along different evolutionary branches can be made. For example, 
organisms such as M. genitaliuni and H, influenza can be used with results 

10 benchmarked against earlier studies (Mushegian and Koonin, 199G). By 

lumping reactions occurring in many different species within the Universal 
stoichiometric matrix described earlier a species independent minimal 
metabolic reaction set can also be constructed. The predicted E. coli based 
metabolic minimal set of 124 reactions/genes is comparable to the 94 metabolic 

15 genes included in the minimal gene set proposed by Mushegian and Koonin 
(1996). The present invention contemplates that this prediction gap can be 
reduced by (i) identifying more efficient reaction combinations, including those 
occurring in non-£'. coli species, and (li) by uncovering genes that are involved 
in the uptake or secretion of multiple (similar) metabolites reducing the total 

20 count. Clearly, the proposed computational framework is dependent upon a 
reaction-based analysis which inherently cannot account for genes associated 
with translation, replication, recombination, repair, transcription, cellular 
structure and genes of unknown function. HovN^ever, it does afford the 
versatihty to study different uptake/secretion environments as well as to 

25 encompass reaction sets from multiple species in the search for the metabolic 
minimal genome providing valuable insight and perspective to the questions of 
v/hat is the minimal genome and how is it shaped by the environment. As 
more elaborate models are developed describing elementary functions of 
minimal cells, such as the work of Browning and Shuler for the initiation of 

30 DNA replication, more detail will be added to the modeling framework, S. T. 



21 



U () 02/<l>5')*)5 



P( 171 S(I2/(MU»(>0 



J 'i-ownin^- and M. L. Shulor, AICHE Annual Meeting, Session 69, Se.^sion GO, 
Los Angeles (2000). 

Apart from developing; a framework for rationally identifying "minimar' 
.•..etabolic networks we also intend to exploit the capability of predicting in 
silico lethal gene deletions for different organisms and uptake environments. 
By identifying lethal gene deletions for pathogenic microbes as a function of 
the environment (e.g., //. pylori) a ranked list of promising targets for 
therapeutic intervention (i.e.. interruption of gene expression) can be compiled. 
This list can further be refined by imposing constraints ensuring that human 
metabolism do not adversely be affected by repressing the expression of any of 
the pathogen genes included in the list. 

7. MIXED-INTEGER LINEAR PR0GRA3IMING SOLUTION 
TECHNIQUES 

The modeling framework of the present invention further provides for 
^'omputational procedur-^s to be used to solve the network problems presented. 
The computational procf^dures to be used include mixed-integer linear 
programming techniques. 

The algorithmic framev\^orks of the present invention in the context of 
gene addition, regulation, DNA array data superposition, genetic circuit 
elucidation and minimal reaction set identification inherently require the use 
of discrete optimization variables that give rise to MILP problems. Unlike LP 
problems w^hich can be routinely solved even for hundreds of thousands of 
variables b3^ employing commercial solvers (e.g., OSL, CPLEX, LINDO, etc.) 
with minimal or no user intervention, MILP problems are much more 
computationally challenging tji^ically requiring not just more CPU time but 
also user intervention. Specifically, it is typically necessary to (i) cast the 
problem in a form that is more amenable to MILP solution techniques, and lii) 
if the problem is still intractable for commercial solvers, to construct 
customized solution methodologies 



P( r/l!S02/0{>6(>0 



The key source of complexity in MILP problems in metabolic networks is 
the number of reactions/g^enes whose on or off switching as well as prediction 
of over- or unchn'-expression requi res binary 0-1 variables tu describe. These 
problems belong to the class of generalized net work problems (Ahuja ci a/, 
5 li)93) where each metabolite constitutes a node and each reaction ]-ei)rescmts 
an arc in the network. Gi\ en that existing FBA models for prokaryotes 
(Edwards and Palsson. 2000) contain hundreds of reactions and upcoming 
models for iS. cerevisiae will likely be in the thousands motivates the need to 
harness complexity. In addition, the tremendous redundancy, redirection 

10 capability and multiplicity of steadj'-state vsolutions further exasperates 

complexity issues. In light of these challenges, some of the problems addressed 
by the present invention so far, particularly in the context of the minimal 
reaction sets required CPU's in the order of 50 hours. 

A number of preprocessing and reformulation techniques can bo used 

i5 according to the present invention to alleviate the computational burden. 
These techniques include isoenzyme grouping, futile cycle exclusion and 
netwx)rk connectivity constraints. Isoenzy me grouping refers to the 
aggregation of reactions differing only in the catalyzing enzyme (i.e., 
isoenzymes) in a single reaction. This reduces complexity by pruning the total 

20 number of binary variables. Futile cycle exclusion addresses the removal of 
sets of reactions (2 or more) which collectivelj^ recycle fluxes in a loop without 
any net effect on metabolism or energ>^ generation. In general, a set K 
composed of K reactions forms a futile cycle if 
Yj^K^;-^^ V/ = U.,/V 

15 The following constraint: 




inactivates at least one reaction breaking the cycle. 

Connectivity constraints will ensure that if a reaction producing an 
intracellular metabolite is active, then at least one reaction consuming this 
metabolite must be active and vice versa. In addition, if a reaction 



P( r I S02 00(>(>0 



ti-ansporting- an extracellular metabolite into the cell is activ(\ then at least 
one intracellular reaction consuminu this meiaboUte iixust be active and vice 
v-ersa. 

State of the art commercial MILP solvers such as CPLEX6.1 and OSL 
A/hich run on a multiprocessor unix platform IBM RS6000-270 workstation can 
be used to solve these types of problems. For problem sizes that are 
iiitractable with commercial AIILP solvers, customized decomposition 
approaches can be used. For example, Lagrangean relaxation and/or 
decomposition by partitioning the original metabolic network into subnetworks 
looselj^ interconnected with only a handful of metabolites can be used. By 
iteratively solving many smaller problems instead of one large one 
computational savings are expected. Further, the present invention 
contemplates the use of disjunctive programming approaches w^hich combine 
Boolean with continuous variables. These methods have been shon^a to be 
, articularly effective for MILP problems where all the 0-1 (i.e., Boolean) 
variables are aggregated into logic constraints as is the oti e with many of the 
MILP formulations of the present invention. 

3. EX^VIMPLE: PROBING THE PERFORAL\NCE LIMITS OF THE 
ESCHERICHIA COLI METABOLIC NETWORK SUBJECT^ TO GENE 
ADDITIONS OR DELETIONS 

The framework of the present invention can be cipplied to a number of 
etabohc network problems in a number of different couipxis. The present 
Mwention has been used to probe the performance limits of the E. coh 
metabolic network subject to gene additions or deletions. According to this 

ample, an optimization-based procedure for studying the response of 
r metabolic networks after gene knockouts or additions is introduced and 
applied to a linear flux balance analysis (FBA) E. coli model. Both the gene 
addition problem of optimally selecting which foreign genes to recombme uito 
^. coll. as well as the gene deletion problem of removing a given number of 
existing ones, are formulated as mixed-integer optimization problems using 



wo (»2/(l55'>')5 



l'( l7i:.S((2/(»l(>(>(l 



binary 0-1 variables. The developed modeling and optimization framework is 
tested by investigating the eflect of gene deletions on biomass production and 
addressing the maximum theoretical production of the twenty amino acids for 
aerobic growth on glucose and acetate substrates. In the gene deletion stud\', 
5 the smallest gene set necessary to achieve maxunum biomass production in E. 
coli is determined for aerobic growth on glucose. The subsequent gene 
knockout analysis indicates that biomass production decreases monotonically 
rendering the metabolic network incapable of growth after only 18 gene 
deletions. 

10 In the gene addition study, the E. coli flux balance model is augmented 

with 3,400 non-i;;. coli reactions from the KEGG database to form a multi- 
species model. This model is referred to as the Universal model. This study 
reveals that the maximum theoretical production of six amino acids could be 
improved by the addition of ojily one or two genes to the native amino acid 

15 production pathway of coli, even though the model could choose from 3,400 
foreign reaction candidates. Specifically, manipulation of the arginine 
Fi-oduction pathway showed the most promise with 8.75% and 9.05% predicted 
increases with the addition of genes for growth on glucose and acetate, 
respectively. The mechanism of all suggested enhancements is either by: (i) 

20 improving the energj^ efficiency and/or (ii) increasing the carbon conversion 
efficiency of the production route. 

This example according to the framework of the present invention uses 
flux balance analysis and mixed-integer programming tools to select the 
mathematically optimal genes for recombination into E. coli from a metabolic 

25 database encompassing many genes from multiple species. The resulting 
pathways need not lie directly on main production pathways, as they may 
enhance production indirectly by either redirecting metabolic fluxes into the 
production pathways or by increasing the Gwevgy efficiency of the present 
pathways. 

30 The recent upsurge of sequenced genomes has also brought significant 

attention to the question of which genes are crucial for supporting cellular hfe. 

25 



V, () (M/055'>*>5 



P( 171 S02/IHK»(»() 



Flux balance auc'lysis modeling provides a iiseiul tool to help elucidate this 
luesrion. Although FBA modeLs cannot simulate the regulatory :-tructure 
alterations associated with gene deletions, these models can capture whether 
•ufJicient netw^ork connectivity exists to produce metabolites critical to cellular 
r> ..uivival. In fact, a recent FBA model proposed by Edwards & Palsson (2000) 
able to qualitatively predict the growth patterns of 86% of the mutant E, 
culi strains examined. This model was also used to identify some of the 
^issential gene products of central metabolism for aerobic and anaerobic E. coli 
:rovvth on glucose. J. S. Edwards and B. O. Palsson, BMC Bioinformatics 
10 v/joob 1. 1. 

Deter mining the T)iaximum number of tolerable gene deletions in a 
given metabolic system, however, requires a discrete optimization strategy'- in 
»ich multiple gene deletions can be simultaneoush^ examined. A related 
. >proach utilizing discrete optimization to identify all alternate optima m 
\^ linear metabolic models has been proposed by Lee et al. (2000). 

Accoj'ding to the n-esent invention, we examine how stoichiometric 
ooundaries of c.::\i]ular performance expand or contract in the presence of 
: T iltiple gene additions or deletions. A FBA model of the cellular metabolism 
V . i". coli is constvucted incorporating the reaction pathw^ays provided by 
Franianik and Keasling (1997) along with modifications suggested by Karp 
;1999) b^.sed on more recent data. The modifications are either small molecule 
e;:^rrec.^jons based on more recent metabolic information or the removal of 

~ tain ratlways now known to be al^ocnt from the E. coli genotype. A 
-tOHdnomerric m.atrix as suggested by Schilhng containing all metabolic 
15 reacd.ons from the Kyoto Encyclopedia of Genes and Genomes is compiled and 
- orpoi^'ited into the model, C. H. SchilHng, et al, Biotech. Prog. 3 5, 288 
( ■ . 99). We refer to this multi-species stoichiometric matrix as the Universal 

oichionn t rie matrix. A short discussion of flux balance analysis will be 
presented next, followed by the gene addition and deletion formulations and 
t ^leir appi?c:)t;:^on ic biomass and amino acid production in E. coli. 



26 



w o I»2/(I55'>'>5 



p( r/i.so2/(Mii,(,(t 



8.1 Flux Balance Analysis 

FliLx balance analysis (FBA) requires only the stoichiometry of 
biochemical pathways and cellular composition information to identify 
boundaries for the flux distributions available to the cell. Aliliouoh 
microorganisms have evolved highly complex control structures which 
eventually collapse these available boundaries into single points, FBA models 
are still valuable in setting upper bounds for performance targets and in 
identifying "ideal" flux distributions. The underlying principle of FBA is mass 
balances on the metabolites of interest. For a metabolic network comprised of 
N metabolites and M metabolic reactions we have, 

XS^Vj=b,, V/ . (7) 

where S,j is the stoichiometric coefficient of metabohte i in reaction j, Vj 
represents the flux of reaction j, and b,: quantifies the network's uptake (if 
negative) or secretion (if positive) of metabolite i. For all internal metabolites, 
b is zero. Reversible reactions are defined simply as two irreversible reactions 
m opposite directions, constraining aU fluxes to positive values. 

Typicalfy, the resulting flux balance system of equations is 
underdetermined as the number of reactions exceeds the number of 
metaboHtes and additional information is required to solve for the reaction 
fluxes. Several researchers have measured external fluxes to add as 
constraints to their under-determined models, rendering them completely 
determined or over-determined. H. Jorgensen. et at, Biotechnol. Bioeng. 46 
117 (1995); E. Papoutsakis and C. Meyer, Biotechnol. Bioeng. 27, 50 (1985); E. 
Papoutsakis and C. Meyer, Biotechnol. Bioeng. 27, 67 (1985); A. Pons, et al. 
Biotechnol. Bioeng. 51(2), 177 (1996). However, additional assumptions such, 
as removing reaction pathways are often needed before external flux 
measurements can completely define a system, and neglecting potentially 
active pathways to render a system completely defined may cause large 
changes in calculated fluxes (Pramanik, 1997). A popular technique for 
investigating metabolic flux distributions is Hnear optimization (\^arma, 1994). 



l*( r I S(I2 0(K»(>0 



The ke\' conjecture is that the cell is capable of spanning all flux combinations 
rllowablo by ihe stoichionioti^ic constraints and thus achieving any flux 
distributions that maximize a given metaboHc objective (e.g., biomass 
r roductiou). The linear programming model for maximiziiAg bioirias..; 
pi'oduction is* 

Maximize Z v\ 

h. e ^;h\ V / 



vvhere vhio.mas^ is a flux drain comi)rised of all necessary components of bxomass 
in their appropriate biological ratios. Other objective functions such a:^ 
.T»a <imizing metabolite production, maximizing biomass production for a given 
motabohte production, and minimizing ATP prodmaion have also been 
investigated in the prior art. 

8,2 Escherichia coli stoichiometric models 

Microbial stoichiometric models incorporate collections of reactions 
known to occur in the studied species for simulating metabolism. The 
complete sequencing of the E, coli genome makes it a ixiodel organisna for the 
study presented m this paper because extensive knowledge regarding us 
biochemical pathways is readily available. Varma and Palsson proposed the 

st letailed FBA E, coli model capable of predicting experimental 
v)bsorvations. A. Varma and B. O. Palsson, J. Theor. Biol. 165, o03 0-093). 
^Vhr^ stoicliiometric matrix included 95 reversible reactions utilizing 107 
nx^tabohtes for simulating glucose catabolism and macromolecule bio synthesis. 
'This model was used to investigate bj^i'^^duct secrc tion oi' E. coli at 



wo 02/(t55*>*>5 



P( r/liS02/00(»(>0 



increasingly anaerobic conditions and was able to predict the right sequence of 
byproduct secretion consistent with experimental findings: first acetate at 
sUghtly anaerobic conditions, then formate, and finally ethanol ai highly 
anaerobic conditions. A. Varma, et al, Appl. Environ. Microb. 59, 2 165 (1993). 
Building on the previous model, Pramanik and Keashng (1997) introduced a 
model that incorporated 126 reversible reactions (including 12 reversible 
transport reactions) and 174 irreversible reactions, as well as 280 metabolites. 
Pramanik and Keashng (1997) correlated the macromolecule composition of E. 
coh as a function of growth rate, and verified their model with experimental 
data. The model successfully predicted several levels of genetic control such as 
the glycoxylate shunt closing for growth on glucose and the PEP carboxykmase 
flux tending towards oxaloacetate. Furthermore, the glycoxylate shunt was 
active during grow^th on acetate while the flux through PEP carboxykinase was 
toward Phosphoenolpyruvate. 

The stoichiometric E, coli model used in this study employs 178 
irreversible. 111 reversible and 12 transport reactions compiled largely from 
the model pubhshed by Pramanik and Keasling (1997). The modifications to 
the Pramanik and Keasling stoichiometric matrix are given in the table of 
Figure 12. They are primarily small molecule corrections (e.g., ATP in place of 
GTP for succinate thiokinase) or the removal of reactions now known to be 
absent from E, coli based on more recent data (Karp, 1999). Note that similar 
changes were also independently included in the most recently published E. 
coli model of Edwards and Palsson (2000). The metabolic network is fueled by 
transport reactions allow^ing an unconstrained supply of ammonia, hydrogen 
sulfate, and phosphate, along with a constrained supply of glucose or acetate to 
enter the system. Oxygen uptake is unconstrained to simulate aerobic 
conditions. Unconstrained secretion routes for lactate, formate, ethanol, 
glycer aldehyde, succinate, and carbon dioxide byproducts are provided by the 
transport reaction fluxes. The Universal model is constructed by incorporating 
3400 cellular reactions from the Kyoto Encyclopedia of Genes and Genomes 
into the modified Keasling stoichiometric model. The Universal stoichiometric 



W () il2 (I55V*)5 



P( i/i s(i:/(Mi(»(»o 



mnrrix contains all reactions known to occur in E, coli. as well as a number of 
i-eactions from other organisms. 

8.3 Mathematical Modeling of Gene Delerions/Additions 

Practically every metabohc reaction is regulated to some extent by one 
: : more enzymes, produced by the translation of one or more genes. As a 
result, the removal of certain genes from microbial DNA sequences can be fatal 
or ha ve little if any effect depending upon the role of the enzymes coded for by 
these genes. Conversely, the addition of certain genes through recombinant. 
DNA technolog}^ can liave either no effect or produce novel desirable cellular 
functicnahties. Given a stoichiometric model of £. coli metabolism and the 
rjniversal stoichiometric matrix encompassing reactions occurring in multiple 
..,)ecies, the goal of this section is to formulate a mathematical model that (i) 
captures colluhu' robustness in the presence of multiple gene deletions, and (ii) 
identifies additional genes from the Universal data set having the most 
j >rofoand eff^r :;t on improving a given metabolic objective. 

First, dehne {]z\ ^ {1,...,M,...,T] as the set of all possible genes where 
•/represents tlie number oiE. coli genes and T represents the total number of 
..c nes in the data set. This set can be partitioned into tw^o subsets E and NE 
Vv here subset E represents genes present in E, coli and subset NE represents 
genes present cnlj^ in non-^J. coli species: 

B = {k \\<k< M) 

TcS = {k I M + 1 < /c < T) 

Subsequently, let binary variable yk describe the presence or absence of 
each gene h\ 

_ |0 if gene k is not expressed in host organism 
[ 1 if gene k is present and functional 

Che seieclicm of the optimal gene choices for deletion or insertion iroiw 
DNA recombination can be determined by appropriately constraining the 

iinbc r oi r.on-zero elements in y. The case of removing a given number of 
.senjs, cL from E. coli can be investigated by including the following constraint: 

30 



wo <»2/(>55*>*>5 



This ensures that no more than (M - d) genes are available to the 
metabohc network. Similarly, the effect of introducing any number of 
additional genes, //-, can be mvestigated by utilizing: 

(8) 



m 

Equation (8) allows all E. coli genes to be present and functional if necessary, 

10 while equation (9) sets an upper hmit to the number of allowable additions. 

The optimal genes selected by the model -axe obtained by determining which 

elements of NE are equal to one. In addition, since multiple genes often 

correspond to a single reaction and occasionally multiple reactions are 

catalyzed by i n enzyme coded for by a single gene, the binary parameter a.ji, is 

15 defmed to describe which enzymes are coded for by which genes: 

_ [O if gene k has no direct effect on reaction j 

[1 if gene k codes for an enzyme catalyzing reaction j 

Parameter aj^. establishes links between genetic functional assignments and 
reactions. In order for a flux vj to take on a non-zero value, at least one gene 
must code for an enzyme catalyzing this reaction (ajk ~ 1) and this gene must 
20 be present and functional in the host organism (yk = 1). Given that at least one 
gene must code for every enzyme we have, 

V . ~ ^ ^"^^ gene coding for the enzyme of reaction J is functional 

u > 1 if at least one gene coding for the enzyme of reaction /" is functional 

Tins implies that the following constraint, 

A \ /- ^ 

k ) V k 7 



31 



(insures tliMt Ly- 0 il' there exists no active i^ene A' capable of supporting 

roacLiony. In tins case, y^jK-^'v which in turn forces the value of Uj to zero. 

k 

:\l.lo)'natively, if at least one such geuG? is functional, then T^a,^;-. > L allowing:- 

k 

.7 to ccssuine any value between a lower Lj and an upper Uj bound. These 
bounds are set by inmiinizing/maxiinizuig respectively the given flux Vj subject 
to the stoichiometric constraints. These problems arc solved using CPLf]X G.G 
c ccessed via the commercial software package GAMS. Problems with up to 
3700 binary variables were solved on an IBM RS6O00-27O workstation. 

8.4 Gene Knockout Study 

In this example according to the presentation, we determine wJiat is the 
: ijiallesc gene set capable of maximizing bioinass production on glucose 
•.ubstrate (uptake basis: 10 mmol ) and what is the maximum number of gene 
leletions from this gene set that still maintains a specified level of biomass 
production. First, we maximized the bioiuass prodtiction flu v.. The 
^okition yields the maximum theoretical level of bitmiass production ( v"^;,^ 
I " '5 g biomass/gDW-hr) achievable by the metabolic netwoik withni the 
otoichiometiic constraints. Next,, the minimum number of genes that 
maintains a specified target level of bioinass production i'^)^';;;^^ (as a percentage 
of the maximum) is determined. The new objective function minimizes the 
total number of functional E, coli genes avail abl- to the cell subject to the 
constraint of setting biomass production v,, greater than or pQual to v''''^-^' 
This problem is formulated as: 

Minimize Z = y^y^ 

kcS 
M 

subject to y S;-v . ^ ^. , / 



wo 02/(>55»>*>5 



bio/nass — /u',>:7:(:\s 



r, 6:«\ V./ 

y, e{0,l},V/te£ 



where the nonzero elements of jv,' define the mininium gene set capable of 
attaining the target growth rate. The smallest gene set Mu,o% , capable of 
sustaining the maximum theoietical growth rate is obtained by setting 

10 v;„^=f4 =100% 'v^Z^^^ . The model predicts that 202 non-transport intracellular 
leactions out of 400 available reactions (111x2 reversible reactions + 178 
irreversible reactions) are required to sustain vZl,„, ■ These reactions include 
the gtycolytic reactions, the pentose phosphate pathway, the TCA cycle, the 
respiratory reactions and all other anabolic and c:'i:abolic routes necessary for 

15 optimal growth. 

Given Mioo% , the next goal is to determine which of these genes could be 
knocked-out while still allowing the metabohc network to sustain specified 
sub-optimal growth rates. This is accomplished by setting equal to 

various percentages of v,"^;:,„„ and constraining the intracellular reaction fliixes 

20 outside of Mioo% to zero. It must be noted that this assumption prevents the 
model from activating any genes outside of the M ioo% set and the significance 
i f this assumption will be discussed in the following section. The number of 
allowable gene knockouts for various biomass production le\'els are given in 
Figure 13 while the selected gene removals are presented in the table of Figure 

25 14. As expected, as the biomass production demands on the network are 
lessened, the model tolerates more gene knockouts. Howe\'er, the range of 
allowable knockouts is rather small. Specifically, the modal tolerate.- at most 
9 gene deletions with a biomass requirement of DO%*v'™;;^,,^ , while 18 gene 

33 



IH I 'l S02 '00(»(,0 



^( iiiovals render the network incapable of biomass formation. Thus the suhset 
ev)ntaimng all elements of Mioo'>. minus the 18 gene knockouts (194 genes) 
describes the smallest subset of Mioo% capable of sustaining E. coll cellular 
'."/th for the emplojxKl FBA modeh Additionally, it must be noted that all 
vabsets include the seven experimentally verified essential gene products of 
(.entral metabolism identified by the iii silico gene deletion study of E. coli 
i -niducted by Edwards and Palsson (2000b). 

8.5 Discussion on the Gene Deletion Study 
10 Investigation of the specific gene knockouts provides interesting insight 

into the effect of various energy generation pathways. The suggested gene 
deletions imply that the energetic status of the network is improved as the 
required biomass production demands on the cell are reduced. This is 
demonstrated by the fact that as the biomass requirements are lessened, the 

' optimization formulation sequentially eliminates pathways responsible for the 
formation of enorgy. One such observation involves the gradual degradation of 
c TCA cycle When the model is constrained to produce only 80% of the 
optimal level of biomass, the network no longer utilizes the succinate 
dehydrogenase enzyme to produce FADH2. Further reducing the biomass 

20 production requirement to 70% enables the removal of the fumAB, mdh, and 
;:ucCD genes forgoing the formation of one GTP and one NADH per unit 
reaction flux. The next major energy^ formation pathway to be eliminated 
occurs at a biomass product ion level of 20%. At this point, the energetic state 
of the cell is such that it no longer requires the formation of ATP from the 

^5 D'llular proton gradient. Finally, at the lowest biomass production levels, the 
cell no longer requires the oxidation of NADH to force protons across the 
cellular membrane. 

This study provides insight into the dependence of cellular growth on 
various energy generation pathways and provides an estimate of the minimum 

.0 number of metabolic genes capable of enabling cellular growth. The prediction 
of 194 genes is lower than the theoretical estimation of 256 by Mushegian and 

34 



wo U2fi)>5*y>5 



Koonin (199(.>) obtained by investigatino' the complete genomes oi: Hacjnophihis 
uiflueiizac and MycoplasnLCt gcnitaliinn and arfsiimin<;- j^enos prcvsovved across 
lar^e ph\ ioocuictic distanci^s are most likely essential. This was expectcid 
considering t he mabihty of this reaction-based framework to account for genes 
^ associated with translation, transcription, replication, and repair, and the 
kmiping of i)athw ays b\- the stoichiometric model. A more practical 
comparison involves considering the number of metabolic genes included in the 
minimal gene set (estimation. In this case, the predk;ted set of 194 metabolic 
genes overestimates the 94 metabolic genes included in the minimal gene set 

10 pro])osed by Mushegian and Koonin (1996). This overestimation arises in part 
because the effect of activating metabolic genes outside of the original optimal 
gene set w^as not investigated. This lowers the minimal gene set estimation by 
opening additional metabohc routes. Furthermore, this study only allowed 
glucose to enter the network as organic fuel and limited metabolic capacity can 

15 be compensated for by a proportionately greater dependence on the 
importation of nucleoside? amino acids, and other metabohtes. C.A. 
Hutchison, et al. Science 286, 2165 (1999). 
8.6 Amino Acid Production Optimization Studies 

In this section, we identif>^ mathematically optimal reaction pathwaj^s to 

20 recombinc into the E. coli metabolic network to optimize amino acid formation 
for growth on glucose and acetate. We explored the theoretically optimal 
formation of all twenty amino acids. Each optimization run was performed for 
two cases: (i) including only the reactions present in E, coli, and (ii) allowing 
the model to select all reactions from the Universal stoichiometric matrix. The 

25 problem of maximizing the ammo acid production is formulated by 

substituting amino acid accumulation, b^^^., , in place of v^,.^„^„^^ in equation (7), 
while the problem of maximizing the amino acid formation 6^^f^'of the 
Universal network is formulated as: 

Maximize Z=^h'''^'^ 



35 



\\ () 02 055«)*>5 



subject to V .S"^^ V ^ - , V ^ 



y, - 1, V k 



L. 



Note that tins formulation allows the sel(^ction of any number of 
reactions from the multi-species reaction list. Reactions chosen b\^ the model 
but absent m E. coli (i.e., all non-zero elements oiNE^) provide routes for 
manipulating the cellular metabolism through recombinant DNA technolo^/. 
The theoretical amino acid production capabilities of the E. coli metabolic 
network, with and without the additional reactions from the Uni\'orsal m;itrix, 
no shown in the table of Figure 9 for growth on glucose and acetate. It miibt 
oe noted that it is the structural pathw^ay changes predicted by the model tnat 
are more meaningful than the exact numerical values because these are 
1 V«eoretical maximum yield calculations. Predictions by the Varma and 
Falsson (1993) model are show-n for comparison. As expected, the maximum 
production capabihties by the Varma and Palsson (1993) model are shghtly 
below the predictions of the more complex employed model due to the 
additional metabolic routes available for production. 

The results show^ that improvements to seven amnio acid production 
pathways of £. coll are theoretically attainable with the addition of genes from 
various organisms. Manipulation of the arginine pathwiiy shows the most 
promise, with 8.75% and 9.059o increases with additional genes for growth on 
glucose and acetate, respectively. The optimal recombinant asparagine 
pathway show^s 5.77% and 5.45% increases over current E. coli growth on 
glucose and acetate, while cysteine production can be raised o.ru% and 3.80';o, 



wo (»2/(t55<)<>5 



p( r/rs(t2/tMt(>(.it 



10 



15 



respectively. The histidine production pathway is revealed as another 
encouraging target for DNA recombination with 0.23% and 4.53% 
improvements available as well. The isoleucine, methionine, and trypr,oi)hun 
formation pathways offer the final three genetic objectives for euliancing 
production. 

The enzymes responsible for introducing these various improvements to 
the E. coli amino acid production pathways are shown in the table of Figure 
15. In most cases, the addition of only one or two genes to the original amino 
acid production pathway results in an increased maximum theoretical yield 
even though the complete list of 3,400 reactions was ax'ailable for selection. 
For example, introducing foreign genes coding for carbamate kinase and the 
pyrophosphate dependant version of 6-phosphofructokinase further optimizes 
arginine production for growth on glucose, while adding carbamate kinase and 
another gene coding for acetate kinase renders the argmine production 
pathway on acetate stoichiometrically optimal. Expressing the genes coding 
for aspartate-ammonia ligase and sulfate adenylyltransferase in E. coli re ailts 
in the increased mentioned earlier in asparagine and cysteine productions, 
respectively. Only the production of isoleucine on glucose and acetate 
substrates and the production of methionine on acetate require over two 
20 additional enzymes to reach optimality according to the model. 

8.7 Discussion on the Gene Addition Study 

Careful examination of these amino acid pathways reveals how these 
additional enzymes improve the energetic efficiency of the original routes. The 

25 original and Universal arginine production pathways for growth on glucose are 
shown in Figure 16. The two pathways differ in only two reactions - the 
pyrophosphate dependant analog of 6-phosphofructokinase m the Universal 
model replaces the ATP dependent version present in E. coli, and carbamate 
kinase in the Universal model replaces carbamoyl phosjjhate synthetase from 

30 the original E. coli model. The first improvement to energy utiHzation occurs 
because the Universal model 6-phosphofructokinase uses pyrophosphate 



W () 02 055'>*>5 



P( |Vi;S(l2/0(l(,(>(> 



formed from Ar^^iiiinosuccuialo synihaso reMci ion instead of ATP to transfer a 
rdiosphate group to frMctose-6-phosphate in the third step in g]\-colysis. The E. 
coli model, which sends this pyrophosphate through pyrophosphatase for 
hydrolytic cleavage, m effect wastes the energy from this energy-rich 
phosphoanhydride bond. By recapturing this otherwise wasted energy, the 
1 3'rophosphate version of 6-phosphofructokinase requires one less ATP 
phosphoanhydrido bond per argimne molecule produced. 

The second form of cellular energ\^ savings is realized by the 
••eplacement of c arbamoyl phosphate synthetase. The native carbamoyl 
phosphate synthetase creates one mole of carbamoyl phosphate from carbon 
.{ioxide at the expense oftv^ o ATP phosphoanhydrido bonds. This reaction also 
quires an amino group of one glutamine molecule, which subsequently forms 
gkitamate. Reforming glutamine from glutamate requires yet another ATP; 
thus each unit flux through carbamoyl phosphate synthetase requires three 
ATP. Carbamate kinase, mcoiporaiGd m ^he ITniversal modeL forms 
carbamoyl phosjyhate from carbon dlo^ de and ammonia at the expense of only 
one ATP. Therolbi-e, carbamate kinase requires two less ATP bonds per umt 
flux of carbamoy l phosphate formed. Overall, the additional genes used by the 
Universal mode] save the original pathway three net ATP bonds increasing 
arginine production by 8.75%. A similar analysis can be performed on native 
and Univorsal arginine production routes from acetate substrate depicted in 
Figiu'e 17. 

The E. coll aspMragme production pathway is shown in Figure IS for two 
modes of ghu-ose entry into the metabohc network - glucokinase and the 
i: aosphotransferase system. Interestingly, the E. coli model prefers 
glucokinase to the more common phosphotransferase system for glucose entry 
during optimal asparagine production. ^Uthough glucokinase is known to play 
a minor role in glucose metabolism under normal conditions, replacement of 
the phosphotransferase system by this reaction increases asparginme 
])roduction from 1.560 naol/mol glucose to 1.818 mol/mol glucose. Glucose 
entry via the phosi)hot-ansferase system requires substantial lliix through 



wo 02/055W5 



l'< I7l;,s02/0(i(><>(i 



phosphoenolpyruvatc (PEP) synthase to regenerate PEP from pyruvate 
carrying the not expense of one ADP phosphoanhydride bond. Thus cither 
over-expressing giucokinase in E. coli or adding a more active recombinant 
glucokinase enzyme may improve asparagino production. Figure 19 illustrate^ 
the optimal Universal route for as])aragine production on glucose. By choosin<i 
the ADP-forming aspartate-ammonia iigase enzyme over the AMP-forming 
version present in E. coli, the energy efficiency of this pathway is improved. 
Presently no pathways for the conservation of the pyrophosphate bond energy 
have been identified in E. colL thus the formation of AMP uses the equivalent 
of two ATP phosphoanhydride bonds. In contrast, by forming ADP, the 
Universal pathway requires the breakage of only one phosphoanhydride bond 
per unit flux. In fact, the energy^ ef ficiency of the Universal model is such that 
the formation of asparagine does not require ATP formation from the trans- 
membrane proton gradient. This gradient is used solely to transport inorganic 
phosphate into the cell. This mechanism improves asparagine production 
5.77% for growth on glucose ar-d 5.45% for growth on acetate. 

The optimal histidine production pathways of the E. coli and Universal 
models for growth on acetate are shown in Figure 20. Again, the Universal 
model selects a reaction to conserve the phosphoanhydride bond energy of 
pyrophosphate generated in this case by both ATP phosphoribosyltransferase 
and phosphoribosyl-ATP pyrophosphatase. Thus the Universal model is at 
least 2 ATP more efficient than the E. coli model per histidine molecule 
produced. In addition, the addition of glycine dehydrogenase to the E. coli 
model improves the carbon conversion of the native histidine pathway. Under 
optimal histidine production conditions in native E. coli, intracellular glycine 
IS converted to carbon dioxide and ammonia by the glycine cleavage system. In 
this process, only one of glycine's carbons is conserved by its transfer to 
tetrahydrofolate. The Universal model, on the other hand, conserves both 
carbons by converting glycine to glyoxylate which subsequently is pumped 
back into the glyoxylate shunt. Both mechanisms improve the maximum 
theoretical yield of histidine 4.53%. 

39 



8,8 Ci)nclusioiis 

Trio proposed uplinuza(i')n framewoik provided the quantitative means 
to study metabolic network performance in response to gene deletions or 
idditions. Metabolic network i)erformanoe relates to either robustness in the 
{'ace of gene deletions or flux enhancements through foreign gene 
recombination from an ever-expanding database of available genes. Although 
t ..aplete gone-enzyme relationships are not currently available, the 
'ormulation enables the incorporation of this information as it becomes 
- . liable. The gene knockout analysis revealed that the E. coli metabolic 
network optimized for growth could endure an increasing amount of gene 
-niockouts as its growth demands are lowered. Furthermore, the network 
ould theoretically tolerate at most 18 gene deletions before biomass 
■ oduction is no longer possible. The gene addition studies revealed that 
adding additional options to the E, coli genotype by DNA recombination 
:;^ro\^ded improvements to the maximum theoretical productions of seven 
o acids. These improvements occur by one of two mechanisms: (i) by 
iproving the energy efficiency or (li) by increasing the carbon conversion 
^^ffi<nency of the production route. 

The reliance of flux balance analysis strictly on stoicidometuc 
racceristics is its greatest strength but also can be its most prominent 
ealiness. The flux distributions within the cell are ultimately uiiiquely 
■ -jtermined by the regulatory mechanisms withm the cell the kinetic 

lavacteristics of cellular enzymes, and the expression of these enzN'mes. 
/ ssuming cells operate in a stoichiometrically optimal fashion yields a wider 

]idary of metabolic flux distributions than may be available to the cell. - ' 
Currently we are incorporating regulatory information into flux balance 
nodels with the use of logic constraints. These constraints will ensure that up 

lov/n movements in metabolite concentrations are consistent with up or 
ivAvn shifts in reaction flux values. A more tightly constrained model wi]l give 
'ddUional insigln on how overproducing cellular products affects overall 
netabohc regulation. As the accuracy of metal^ohc models improves and the 

40 



\\ () (►2/055'>95 



IH 171 S02/(MM>(>0 



amount of information available for flux balance analysis grows, the 
framework introduced in this paper can be used to select the most optimal 
gone addition and/or deletion metabolic manipulations to perfc 



lorm. 



9. EXAiMPLE: MINIAL4L REACTION SETS FOR ESCHERICHIA COLI 
METABOLISM UNI3ER DIFFERENT GROWTH REQUIREMENTS AND 
UPTAKE ENT\q:RONMENTS 

The framework of the present invention can be applied to a number of 
metabolic network problems in a number of different contexts. The framework 
of the present invention has also been appUed to determining minimal reaction 
sets for E. Coli metabolism under different growth requu'ements and uptake 
environments. According to the present invention, a computational procedure 
for identifymg the minimal set of metabohc reactions capable of supportmg 
various growth rates on different substrates is introduced and applied to a^'flux 
balance model of the E. coli metabolic network. This task is posed 
mathematically as a generalized network optimization problem. The minimt-. . 
reaction sets capable of supporting specified growth rates are determined for - 
two different uptake conditions (i) limiting the uptake of organic material to a 
single organic component {e.g., glucose or acetate) and (li) allowing the 
importation of any metabolite with available cellular transport reactions. We ' 
fmd that minimal reaction network sets are highly dependent on the uptake 
environment and the growth requirements imposed on the network. 
Specifically, wo predict that the E. coli network, as described by the flux 
balance model, requires 224 metabolic reactions to support growth on a 
glucose-only medium and 229 for an acetate-only medium, while only 122 
reactions enable growth on a specially engineered growth medium. 

The recent explosion of fully sequenced genomes has brought significant 
attention to the question of how many genes are necessary for sustaining 
cellular life. A minimal genome is generally defined as the smallest set of 
genes that allows for replication and growth in a particular enviionment. 
Attempts to uncover this minimal gene set include both experimental and 



41 



theoretical iij^proaclics. Global transposon mutagenesis was used by 
Hulchison ct ai (1999) to determine that 2G5 to 350 of the 180 protein-coding 
9:cncs oi Mycoplasnia geiiitaliuni, the smallest known collular genome (580 kb), 
<ire essential for survival under laboratory growth conditions. Additional 
'.experimental work revealed that only 12"i) and 9% respectively of the yeast 
raid Bacillus suhtills genomes are essentia d for cellular growth and replication. 
M. G. Goebl and T. D. Petes, Cell 46, 9S3 (1986): M. Itaya, FEBS Lett. 362, 257 
(.1995). Theoretical methods stem from the assumption that genes conserved 
across largo evolutionary boundaries are vital to cellular survival. Based on 
this hypothesis, a minimal set of 256 genes was compiled by Mushegian and 
Koonin (1996) by assuming genes common to M. genitalium and Haemophilus 
in fluenzae must be members of a minimal genome. Interestingly, only 6 out of 
26 E. colt open reading frames of unknown function conserved in M. 
genitalium were deemed essential to species survival (Arigoi. et al. 1998). The 
i xistenco of multiple, quite different, species and e.-.vironment specific minimal 
genomes has long been speculated (Huyiv/n 2000), 

Here we describe a computational jnocedure for testing this claim by 
(estimating the minimum required growth-sustaining core of metabolic 
reactions under different uptake conditions. The latest stoichiometric model of 
E. coll metabolism proposed by Paisson and coworkers (Edwards & Palsson 
;*.000b) is employed to identify the smallest set of enzymatic reactions capable 
of supporting given targets on the growth rate for either a glucose, an acetate, 
er a complex sul^sti'ate. This flux balance anah sis (FBA) model incorporates 
154 metabolites and 720 reactions including the glycolysis, tricarboxylic acid 
(TCA) cycle, pentose phosphate pathway (PPP), and respiration pathwaj^s 
along with synthesis routes for the amino acids, nucleotides, and lipids. 
Growth is quantified by adding an additional reaction to the model simulating 
a drain on the \'arious components o f E. culi bioaiass in their appropriate 
biological ratios. F. C. Neidhardt, Esc herich ia coJi and Sal monella: Cellular 
a nd Molecular Bi o1oua\ ASM Press ed. Washington, D.C., 1996. By associating 
a gene to eacdi metabolic reaction in the network, gene activations and 



wo <l 2/(155995 



P('l/liSi)2/(M»(,<,(» 



inactivations are incorporated into the FBA model using logic 0-1 binary 
variables. The problem of minimizing the number of active metabolic reactions 
required to meet specific metabolic objectives {Le., growth rates) is shown to 
assume tlio mathematical structure of a gencraliz.od network flow problem 
^ where nodes d(Miote metabolites and connectmg arcs ro]n-oscnt reactions. 

Alternatively, instead of a biomass target, minimum levels of ATP production 
er lowest allowable levels of key components/metabolites could readily be 
incorporatc.d m the model. A mixed-integer linear programming (IMILP) 
solver, CPLEX G.5 accessed via GAiMS, is employed to solve the resulting 
10 large-scale combinatorial problems with CPU times ranging from minutes to 
days. 

Based on the E. coli model, the minimal reaction network is explored for 
different growth requirements under two contrasting uptake environments (i) 
restricting the uptake of organic material to a single organic component and 
(li) allowing the uptake of any organic metabolite with a corresponding 
transport reaction. These two extreme uptake scenarios were chosen to model 
maximum and minimum reliance on internal metabolism for component 
synthesis respectively, and probe their eftect on the minimum reaction set 
required. Previous attempts utilized reductionist methodologies to extract the 
20 set of essential genes through a series of gene knockouts. Here we use an 
efficient computational procedure for selecting the minimal set by 
simultaneously considering the effect of all reactions on cell growth. A 
minimal gene set is then be inferred by mapping the enzyme(s,) catalyzing 
these reactions to the corresponding coding genes. While the obtained results 
are, in principle, dependent on the specifics of the employed flux balance E. 
coli model (Edwards & Palsson 2000), they still provide valuable insight and 
perspective to the questions of what is the minimal genome and how is it 
shaped by the environment. 



r 

15 



25 



43 



\\ () 02/055*>*)5 



r( I I S(»2/00(><»0 



0.1 Results 

The lirsi case study involves idontiiVing the minimal reaction set 
supporting E, coli growth on a glucose substrate. A detailed description of the 
-^mploj^ed modelmg procedure is provided in the appendix. A constrained 
amount of glucose (< 10 mmol/gDW-hr), along with unconstrained uptake 
X chutes for inorganic phosphate, oxygen, sulfate, and ammonia are enabled to 
fuel the metabolic network. Secretion routes for every metabolite capable of 
.. -iting the cell are also provided. Under these conditions, the FBA model 
predicts that the E, coli reaction network is capable of achieving a maximum 
theoretical growth rate of 0.9(36 g biomass/gDW-hr, which we will refer to as 
"he maximum growth rate (]\JGR). By requiring the reaction network to match 
. lO MGR we determined that at least 234 reactions out of 720 are required for 
maximum gro\\'tli on glucose. 

The growth demands are then relaxed in subsequent studies to identify 
«^ miuimal number of metabolic reactions required to meet various sub- 
maxim i growth demands (% of MGR). Interestingly, the number of necessary 
metabolic rcc ctions decreases only mildly with the falling growth demands 

• mposed on the network as indicated by Figure 21. While a reaction set 

-mprised of 234 reactions is needed for maximum growth, the minimal 
eaction set corresponding to growth rates of 30% and low^er involves only 224 
reactions. The same minimal reaction set persists even for growth rates as low 
as 0.1% of the MGR. In general the reaction set reductions are attained by 

Lccessively eliminating energy producing reactions occurring in (i) glycolysis, 
, i) the TC^V C3-CI0, and (iii) the pentose phosphate pathway as the growth 
demands are lessened. However, certain reactions absent at higher growth 
rates enter the minimal sets at lower growth rates suggesting a much more 
complex mechanism of flux redirection than successive reaction elimination A 

• trailed description of the reactions entering/leaving the minimal reaction - et 
the imposed growth requirements are lowered is provided in the table of 

Fi2*ur- '-2, 



44 



wo 02/(l55W5 



P( l/liSi»2/<tt»0(>(» 



For comparison, a similar study enabling a constrai-.jed amount oi" 
acetate (< 30 nunol/gDW hr) to enter the network instead of glucose was 
performed (see Figure 21). Here the network is much less tolerant of reaction 
set reductions than in glucose study. While for a glucose substrate tlie 
minimal network sizes decrease from 234 to 224 reactions as the growth 
demands are lowered, for an acetate substrate the network sizes reduce only 
from 231 to 229 reactions. This implies that the minimal reaction set size is 
not only dependent on the imposed biomass production requirements, but also 
on the specific choice for the single substrate. 

It is important to note that neither the minimal reaction sets nor their 
corresponding reaction fluxes are unique. For example, for the 30% glucose 
uptake case wo identified over 100 different minimal reaction sets containmo- 
exactly 224 enzymatic reactions without even counting the multiplicities 
associated with the 171 isoenzymes present in the network. Among most of 
these multiple minimal reaction sets, the activity and flux directions of the 
major pathways diffcu- very little. Most variations are concentrated on the 
cataboHc parts of the networks. For instance, while some minimal reaction 
sets secrete carbon dioxide, acetate, and fumarate as the only metabolic 
byproducts, other sets may also secrete varying amounts of formate, glycerol, 
and the amino acids phenylalanine and tyrosine. These results provide a 
computational confirmation of the astounding redundancy and flux redirection 
versatility of the E. coli network. More importantly, all minimal reactions sets 
identified include 11 of 12 reactions whose corresponding gene deletions were 
determined experimentally to be lethal for growth on glucose. Earlier analyses 
based on single gene deletions conducted with this model using hnear 
optimization identified only 7 of 12 lethal gene deletions motivating the 
importance of considering simultaneous gene deletions within an MILP 
framework. 

In the second case study, the uptake or secretion of any organic 
metabolite is enabled. The amount of organic material entering the network is 
kept consistent with the first case study by allowing the uptake of a 

4.5 



P( 171 S02 (►(►ocio 



coustramed amouni of carbon atoms (< (iO mmol/gDW-hr). Unconstrained 
uptake routes for oxyp^en, inorgranic phosphate, sulfate, and ammonia are also 
provided as in the first study. Under these "ideal'* uptake conditions, we find 
that a maximum growth rate (MGR) of 1.341 g biomass/gDW-hr is attainable 
5 rc luiring at least 201 metabolic reactions. The fact that only five amino acids 
are imported under maximum growth (i.e., MGR) conditions indicates that il is 
stoichiometrically more favorable to produce most ammo acids internally 
rather than transport them into the cell from the medium. 

This trend, however, is quickly reversed as the growth rate requirement 

10 is reduced. This reversal yields a corresponding sharp decrease in the total 
number of required reactions as a direct result of the importation of an 
increasing number of metabolites at sub-maximum target growth demands. 
The table of Figure 23 lists the metabolites uptaken or secreted at each target 
growth rate, while Figure 24 (100% ~ 90% of MGR) and Figure 25 (100^/i ~ 1% of 

15 MGR) illustrate the number of required metabolic reactions need^^i to attain 
-arious targt^t growth demands. The rapid reduction i;. size of the minimal 
reaction sets by importing an increasing number of metabolites as the biomass 
demands are lessened (see Figure 23) continues until the grow^th demands are 
'^:duced to about 90% from the MGR. Below this growth target (see Figure 25) 

20 additional but modest reductions are achieved primarily through flux 

redirections. Figure 26 summarizes the reactions which are being removed or 
added to the ininimal reaction set as the growth target is successively lowered. 
'~;'he smallest minimal reaction network for the second case study, comprised of 
122 reactions, is reached when the target growth demands are lowered to 10% 
of the MGR. This minimal network is comprised mostly of cell envelope and 
membrane lipid biosynthetic reactions, along with a number of transport and 
salvage pathway reactions, as shown in Figure 27. As in the glucose-only 

<idy, multiple minimal reaction sets for multi-organic uptake case a] e 
expected. 



46 



wo 02/O55*>*>5 



P( r/l So2/00(>()0 



9.2 Discussion 

In this study, we have identified the minimum number oi E. coli 
metaboHc redactions capable of su])portin^ ^^-owth under two different uptake^ 
environments (i) a yhicose or acetate-only uptake environment and (ii) fret^ 
> uptake or secredon of any organic metaboHte involving a corresponding 
transport reaction. The obtained results quantitatively demonstrate 
that minimal reaction sets and thus corresponding minimal gene sets are 
strongly dependent on the uptake opportunities afforded by the growth 
medium. While an E, coli cell grown on a medium containing only glucose or 

10 acetate reciuires at least 224 or 229 metabolic reactions respi^clively to supi)ort 
growth, a cell cultured on a rich optimally engineered medium could 
theoretically su])port grow^th wath as few as 122 metabolic reactions. In 
addition, the choice of the single substrate affects the minimal reaction set size 
and composition. As expected, the minimal reaction set becomes larger hy 

15 increasing the required growth rate. However, the magnitude of this increase 
is quite different for the examined cases. While in case (i) the minimal 
reaction set increases only from 224 to 234 to meet the maximum growth rate 
on glucose and from 229 to 231 for acetate growth, in case (ii) the minimal 
reaction set almost doubles going from 122 to 201. Another significant 

20 observation is the large redundancy of the E, coli metabolic network, which is 
capable of supporting growth utilizing only 31% of the available metabolic 
reactions for growth on glucose, and only 17% of the available reactions for 
growth on a complex medium. Even these reduced minimal reaction network 
sets exhibit large multiplicities. Specifically, a non-exhaustive list of 100 

25 alternative minimal reaction sets were identified for the glucose-only uptake 
case. 

It must be noted that our analysis provides a species-specific minimal 
m etabolic reaction set, which is a subset of the complete E. coli minimal 
genome. This is a consequence of the adopted reaction-based analysis wdiich 
30 cannot account for genes associated with translation, replication, 

recombination, repair, transcription, and genes of unknown function, A 



P( I I S02 0()(,(.(» 



O'liparjsoli of our mniunal inetabolic reaction set with the essent ial ^one set oi" 
'^lulchison ct al. and the minimal gene set proposed by Alushegian and Koonni 
in their studies with Mycoplasma genitaluim is piwided in Figure 28. The 

'btained rcsuUs agree conceptually w^ith the finding of Hutchison and 

jwarkcrs {2) that limited metabolic capacity can be compensated for by a 
, roportionately greater dependence on the importation of nucleosides, amino 
c cids, and other metabolites. /Vlthough a complete genome-based 
reconstruction of the M. genitali uni metabolic network is currentlv unavailable 
, .''eventing a reaction-by -reaction comparison, the distributions of metabolic 
10 genes/reactions among the various functional classifications in the three 

studies are quite similar. Thus, perhaps the simultaneous reaction removal 
' ^. t-ategy applied to E. coli in this work parallels the evolutionary pressures 
placed on M geiulaliuin to reduce its genome size. The minimal reaction set 
size overestimarion ii; our analysis may be largely due to its species-specific 
oature. Whereas the cellular envelope oi E, coli contains a cell wall made up 

^irgely of peptidogh\:an, the cellular envelope of mycoplasmas lacks a cell wall. 
Thus many of ilic cellular envelope reactions necessary for E. coli survival are 

• :ot included in the genes sots of Hutchison et al. and that of Mushegian and 
Koonin. Another contributing factor is that w^e assign a different reaction/gene 

0 vo the uptake or secretion of each metabolite although similar metabolites can 
be transported by mechanisms associated wath a single gene. Furthermore, 
'rancc our analysis is based on the E. coli model, more efficient reaction 
combinations, pei*haps occurring in iioii-E, coli species, could further reduce 
the minimal gene sot lowering the discrepancy. 

This framework can be utilized to construct minimal reaction sets for 
additional species. By contrasting these minimal sets it could be inferred how^ 
minimal reaction sets (metabolic gene sets) compare along different 

• \>lut!onary branches. Specificalh^ minimal reaction sets for M. genitciluiui 
t.nd H. influenza could be determined and benchmarked with earlier studios. 

M.) 'dditionrdlv, a species mdependent minimal metabolic reaction set can be 
r -rsued hy lumping reactions occurring in many different species wdthin a 

48 



wo (I2/II55W 



l*( r/l!S(l2/(l((6(,o 



Universal stoichiometric matrix (15,16). As more elaborate models are 
developed describing elementary functions of minimal cells, more detail can be 
added to the model. Apart from utilizing this MILP framework for rationally 
identifying '-mimmar' metabohc networks, it can also be used to predict in 
siUco lethal gene deletions for different organisms and uptake environments. 
By identifj-ing le thal gene deletions for pathogenic microbes (e.g., H. pylori), a 
ranked list of promising targets for therapeutic intervention (i.e., interruption 
of gene expression) can be compiled. Even though the proposed computational 
procedure is dependent upon the assumptions of the adopted FBA model, it 
affords the versatility to study different uptake/secretion environments as well 
as encompass reaction sets from multiple species in the search for the mmimal 
genome. 



9.3 Modeling and Computational Protocol 

Flux balance analysis relies on the stoichfometry of biochemical 
pathways and cellular composition information to identify the flux 
distributions potentiaHy available to the cell. For a metabolic network 
comprised of AT metaboHtes and Af metabolic reactfons we have, 

7=1 

where X. is the concentration of metabohte i, So is the stoichiometric coefficient 
of metabolite . in reaction and vj represents the flux of reaction TypicaUy, 
the resulting system of equations is underdetermined (the number of reactions 
exceeds the number of metabolites). The maximizatfon of growth rate is 
sometimes employed as a surrogate for cell fitness. The key assumption is 
that the cell is capable of spanning all flux combinations allowable by the 
stoichiometric constraints and thus achieving any flux distributions that 
maximize a given metabolic objective. This may overestimate the region of 
accessible fluxes by neglecting kinetic and/or regulatory constraints. The 
optimization model (hnear programming) for maximizing bfomass production 
or equivalently growth rate (assuming a 1 gDW-hr basis) is: 

49 



W (> 02/055*)*>5 



Ma^imizc Z - v 

A/ 

y5.,l - / 1, . , .'V 

/-I 

w.^iere vbionu:^^ is the corresponding reaction flux comprised of all necessarj^ 
components of biomass in their respective ratios. One gram of biomass is 
prcKluced per u]iit flux of vbiofnass. Variable bi quantifies the uptake (negative 
sjgn) or secretion (positive sign) of metabolite i. In case (i), only ammonia, 
glucose, oxj'gen, phosphate, and sulfate are allowed to have a negative value 
for bi and any metabolite with a transport reac tion out of the cell can be 
secreted, while in case (ii) all organic metabolites can be imported. In this 
-t udy we explore what is the minimum number of metabolic reactions capable 
of maintaining maximum and sub-maximal levels of biomass production. By 
mapping reactions to their corresponding genes a connection Ixuween biomass 
pi^oduction and gene expression is established. The presence/absence of 
5 e actions, and t herefore genes, is described mathematically by incorporating 
logic 0-1 variables into the flux balance analysis framework. These binary 
variables J 

|1 it reaction llux v, is active 

\ 0 it reaction flux is not acti\ e ' ^ 1 , . - • 

assume a value of one if reaction j is active and a value of zero if u is mactive. 
The following constraint, 

I'. - V . < \^ ^ y : ' r J ^ L...,M 

ensures that reaction flux Vj is set to zero when no gene coding for the enzyme 
catalyzing reaciiony is present and function;d. Alternatively, when such a 
gene is active, vj is free to take values between a lower bound and an 

50 



W () 02 055*>*)5 



I>( T/l 



upper bouiid /'^^'-^ The mixod-into^^er linear programming problem of 
minimizing- ( he total nmnber of functional reactions in the network capable of 

meeting a targ^ot lor biomass production v^;';);;^^^ is as ibllows: 

IMinimize Z y 

J. A 

M 

5 subject to Y^S^^v - = /r, /■ ],..., N 

10 

The above MILP belongs to the class of generalized network problems. 
Here each metabolite constitutes a node and each reaction represents an arc in 
the network. 

1- The presence of over one thousand binary variables causes the problem 

to become computational^ intractable for some instances. In particular, the 
computational burden increases for lower biomass targets and it is much 
greater for case (ii) than case (i) due to the added complexity associated with 
multiple uptakes. To alleviate the computational burden, four preprocessing 

20 techniques are employed: (i) isoenzyme grouping, (li) futile cycle exclusion, (iii) 
flux bounds generation, and (iv) connectivity constraint addition. Isoenzyme 
grouping refers to the aggregation of the 171 reactions catalyzed by 
isoenzymes. Reactions differing only in the catalyzing enzyme (i.e., 
isoenzymes) are grouped together treating all isoenzymes as a single reaction. 

25 This reduces complexity by pruning the total number of binary varial)les. 
Futile cycle exclusion addresses the removal of sets of reactions (2 or more) 
which collectiveh' recycle fluxes m a loop without any net effect on metabolism. 



51 



P( 171 so: 0(K,^(l 



A si)eciai case is reversible renctioiis with nonzero fluxes for both directions. 
In ^^eueral, a. set K composed of /iT reactions forms a futile cycle rf 

E^^^y-O, yV 

The following consiiaint ensures that at least one of them will be inactive 
breaking the c^'cle. 

Overall, 34G futile cx cles were identified and eliminated from the model. Most 
of the futile cycles involved simply reversible reactions. 

The solution time of the resulting MILP problems is highly dependeni 
10 on the tightness of the imposed lower and upper /^'^^^" bounds on the 
flvixes vj. Tight bounds and /^^'^^ are obtained by minimizing and 

maxmnznig respectively, every single reaction flux vj subject to the ilux 
balance constraints and the biomass target specification. 

Maximize Alinimize v . 

i subject to ^S..v. ^ b;. i = I,..., A'^ 

j-.i 

Vbiomass > V-):;;^^ 

A^^-AS IS a linear programming (LP) problem (no binary variables) and is 
quickly soh^eci (i.e., less than a few seconds) for all cases. Note that different 
bounds are generated for different biomatss targets, and the higher the biomass 
target is, the tighter the obtained bounds are. 

".'onnectiviry constraints are also added to ensure that if a reaction 
^5 pL-xbicmg ai:! miraccliular metabolite is active, then at least one reliction 
crnsuming this metalMjlite must be active and vice versa. In addition, if a 
i'^action t)-ansporting an extiaccllular metabohte into the cell is active, then at 
least one intracelluh; l reaction consuming this metabolite must be active and 



P( I7ll^02/<M)(»(,(» 



vice versa. Those relaLions are incorporated in the model as follows after 
partitioning the reaction set J into two suhsets: Jjni representing intraecdlular 
reactions and rJnan. representing reactions trans})orling meiabohtes to and 
from the ccdl. The niet:d)olite set I is also partitioned into two subsets with Ii„ 
and lext. repres(aitjng intracellular and extracellular metabolites respectively. 



15 



25 



.A" ' 



(10) 

.v,,.< V ,..Vze/„„, V/'e{./i,S-, <01 

S„ >i) 

10 (11) 

.v,< Vv.V/e/,„, V/e{./|iV>0} 

J^' irans 

(12) 



(13) 



These connectivity constraints are also employed to identify the smallest 
set of reactions capable of ensuring adequate connectivity between the 
external metabolites and the components of biomass. This problem involves 

minimizing y^y , subject to constraints (10-13) with an active biomass 

J 

20 reaction. 

The iterative generation of the multiple minimal reaction sets is 
achieved by accumulating integer cuts and resolving the MILP formulation. 
Each integer cut excludes one previously found solution. For example, solution 
yj* is excluded from consideration by adding the following integer cut: 



ry j:y .=0 



53 



ill upti ui/;Jtu)ii problems arc^ soh od using CPLEX (i.o accessed through 
itio nxodeliiig nu'ironment G^VMS on an IBM RSGOOO-270 workstation. The 
total cumulative CPU expended for this study was in the order of 400 hours. 

10, OPTIONS AND \\\RIATIONS 

The present invention contemplates an\^ number of ways ni which the 
modeling framework of the present invention can be applied to solve motal^ohc 
i' lAVork problems. The framework of the present invention uses a systematic 
approach to improve upon flux balance models using qualitative information 
such as can be used to define logic constraints. This information can include 
qualitative kinetic information constraints, qualitative regulatory information 
r-Diistramts, differential DNA microarray experimental data constraints, and 
other logic constraints. 

The moaelmg framework of the present invention can oe appheJ to solve 

V . xious metabolic problems. This includes determining the effect of i,ene 
ridditions axid/or deletions, determining optimal gene additions, detc mining 
lethal gene deletions, determining minimal reaction sots as well as 
determining other metabolic manipulations. These and other problems may 

hided requirements of a particular growth rate, certain ori\ ire-imontal 
conditions, or other conditions. 

As the modeling framework of the present invention is in silico, it is not 
limited in an\ way to a particular organism. The present invention 

V otemi)lates that any number of organisms can be modeled. The spirit and 
scope of the invention should be construed broadlv to include a]l that is 
claimed and any equivalents thereof. 



wo 02/055<)*>5 



PC l/ljS02/*MK»(>0 



What is claimed is: 

1. A metliod for modeling cellular metabolism of an or^-anism, comprisin^^: 
constructing a llux balance analysis model: apphang constraints to the Ilux 
balance analysis model, the constraints selected from the s(^t consisting of: 
qualitative kinetic mibrmation constranits, qualitative regulatory information 
constraints, and differential DNA microarray experimental data constraints. 

2. The method of claim 1 wherein the constraints are logic constraints 
selected to protect against violation of a kinetic or regulatory barrier. 

3. The method of claim 1 wherein the constraints are connectivity 
restraints. 

4. The method of claim 1 further comprising the step of applying mixed- 
integer linear programming to solve for a desired metabolic outcome. 

5. The method of claim 1 further comprising the step of solving for a 
desired metabolic outcome. 

6. A method for modeling cellular metabolism of an organism that 
improves upon a flux balance analysis model, comprising: constructing the flux 
balance analysis model; and applying a plurality of logic constraints to the flux 
balance anah'sis model. 

7. The method of claim 6, further comprising selecting the set of logic 
constraints to protect against violation a kinetic or regulatory barrier. 

8. The method of claim 6 wherein the logic constraints are defined by a 
relationship between changes in reaction fluxes and metabohc concentrations. 



'J'iu^ iiu 'thud ul' claim G wherein the logic coii<trauiis arc deiined by a 
<-daLM.jiship l)et\\ e(Mi reaction Iluxcs and transcript levels of gene c:oclin^;. 

\0, riie n\eth:)d ol'danij G wherein the logic constraints are repriserted »\ 
: Linar> /ariabies. 

11. The method of chiim 10 wherein a first bmaiy variable represents the 
presence of a reaction and a second binary variable represents the absence of a 
: eacitioii. 

10 

12. The method of claim G farther comprising applying a computational 
procedure to identiiy a minimal set of metabolic reactions, 

' 3. Th(^ method of claim 12 further comprising selecting a growth rate, and 
1^ vvherem the slop of applying a computational procedure is applying a 

computational pi-ocedure to identil the minimal set of metabolic reactions 
rapable of su]>porting the g}T)vvth rate. 

J,4. The metht:)d of claim 6 further comprising the step of applying mixed- 
C integer linear i)rogramming to solve for a desired metabolic outcome. 

.5. The meth( )d of claim 6 further comtDrismg the step of solving for a 
desired iiietribolic oLUcome. 

?5 16. The metli(.)d of claim 15 further comprising engineering a change in an 
organism based on the desired metabolic outcome. 

^7. \ nietliod for determining a reduced genome, comprising: selecting a 
minimal set of reactions from a set of metabolic reactions that meets a growth 
'0 rate target; mapping enzyme^ catalyzing the muumal set of reactions to a 



56 



r( l /l S02/00(,(.0 



corresponding set ofLoding gones, the corresponding set of coding genes 
defining a reduced genome. 

18. The method of chum 16 wherein the growth rate target is a biomass 
5 target production rate. 

19. A S5\stem for modohng cellular metabolism of an organism, comprising: 
a flux balance analysis model; a plurality of constraints applied to the flux 
balance analysis model, the constraints selected from the set consisting of: 

10 qualitative kinetic niformation constraints, qualitative regulatory information 
constraints, and differential DNA microarray experimental data constramts. 



57 



wo 02/055*>V5 



2/26 



P( 171 S<l2/OOo6(> 




wo <>2/0559*)5 



J/2(> 




wo 02m55*)*)5 



4/26 




F ORCot ORCdI I 

ORCfl ORCeI I 

I ' 

1 



V 



wo 02/O55O*)5 



5/26 



PC r/LS()2/(MM>(,ii 




r 



2PDGL- 



mo 



PEP 



0 V 



PDF 



rpYR 



— AMP 



1 



wo (>2/0559«)5 



6/26 




wo 02/(l55">V5 



7/2(, 



P( r/l S02/(MI(,(,0 



MODEL PREDICT! ONS " 
OF MAXIMUM THE0RI5TICAL YIELDS OF AMINO ACIDS FOR 
GROWTH ON GLUCOSE AND ACETATE 



Palsson 
*93 



Maximum Theoretical Yield 

Inigi^LLR^d Q rnmol Gl ucose) 

Universa 



Modified 

K easiing '97 I Model 'ncrease 



Alanine 


20.00 


20.00 


20.00 




Arginine 


7.74 


9.26 


10.07 


8.75% 


Asparagine 


15.G0 


18.18 


19.23 


5.77'/o 


Aspartate 


18.20 


20.00 


20.00 




Cysteine 


9.75 


11.49 


11.90 


3.57% 


Glutamate 


10.00 


13.33 


13.33 




Glutamine 


10.00 


13.33 


13.33 




Glycine 


20.00 


35.33 


35.33 




Histidine 


7,30 


9.77 


•J.80 


0.23^;n 


Isoleucine 


7.34 


8.00 


3.07 


0.91% 


Leucine 


6.67 


8.00 


8.00 




Lysine 


7.84 


8.45 


a. 4 5 




Methionine 


5.74 


7.04 


7.19 


2.16% 


Phenylalanine 


5.29 


5.76 


r..76 




Proline 


iO.OO 


10.91 


10.91 




Serine 


20.00 


23.04 


23.04 




Threonine 


12.30 


15.00 


15.00 




Tryptophan 


4.14 


4.67 


4.73 


1.28% 


Tyrosine 


5.48 


6.03 


6.03 




Valine 


10.00 


10.00 


10.00 





Maximum Theoretical Yield 
(mmo l p er 10 m mol Acetat e) 

Pal.:ison Mocified Universal 

^93^ ^^li^g '9"^ Model 

5.29 
2.43 



% Increase 



3.93 
1.51 
3.24 
3.32 
1.81 
2.68 
2-50 
3.94 
1.37 
1 .44 
1 .59 
1.55 
1.11 
1.00 
2.10 
3.94 
2.50 
0.76 
1.03 
1,96 



4.1.6 
5.29 
3.29 
3.05 
3.46 

y.oo 

2.43 
2.13 
2.13 
2.18 
1.81 
1.47 
2.90 
5.87 
3.91 
1.17 
1.54 
2.67 



5.29 
2.65 
4.91 
5.29 
3.42 
3.65 
3.46 
9.00 
2.54 
2.13 
2.18 
2.18 
1.85 
1.47 
2.90 
5.87 
3.91 
1.19 
1.54 
2.67 



9.05% 
5.45% 

3.80% 



4.53% 



2.46% 



1.32% 



Palsson '93: 
Modified Keasiing '97 
Universal Model: 



% Increase; 



£. coli model proposed by Palsson f 1993) 
Modified Keasiing (1997) E. coli model as described in text 

Modified KeasluiL' (1997) E. coli model augmented with non-£. coli reactions 
compiled by the K>ot(. Encyclopedia of Genes and Genomes 
Bet^veen the modified Keasiing (1997) model and the Universal model 



Fig. 9 



wo (>2/<»55*>*)5 



S/2<» 





wo 02/055995 



p< 171 S(»:/o(»(.<>(i 



r 



236 



ct:: 



L 



231 



230 



228 



227 227 227 



226 226 226 226 



0 JU/o OU/o 



0 JU/o ilU/o 



70% 60% 50% 
TARGET % OF MAXIMUM GR0W1"H RATE 



10% 0.1% 



J 



/if,//J 



r 



203 



CLJ5 



134 



~l 



100% 90% 



126 


125 


125 


125 


125 


125 


124 


124 


















70% 


60% 


50% 


40% 


30% 


20% 


10% 


0.1% 



TARGET % OF MAXIMUM GROV/TH RATE 



W () (12 0559*)5 



IO/2(» 



MODIFICATIONS 1 0 THE PR/VMANIK AND IvEASLlNG MODEL 



Enzymes 



Reactions assumed irreversible 

P [10 s p [ic)f r u etc I- i n as e 

Citrnte Synthase 

2 - K •: ! tog ! L J ta r a t o D e h y d ro g e n a se 

PRSCAIM Synthetase 

Glycerol Kinase 
Reactions removed from model 

Unknown Pathway 

C ystdthionase 

Sulfotransferase 
Reactions modified 

Fruclose-1 ,6-bisphosphate Aldolase 

Isocitrate Dehydrogenase 

Succinate Thiokinase 

Prephenate DehydrogGiiase 

Hoi Dehydrogenase 

RCAIM Syntheiase 

GTP Cyclohydrolase 

3,4-Dihydroxy-2-Butanone-4-Phosphate 
Synthase 

H2Neopterin Triphosphate 
Pyrophosphatase 

CoA Synthase 



Reactions 



Fructose-1 ,6-bisphosphate — Fructose-G-phosphp.te + Pi 
Acetyl-CoA + Oxaloacetato -> CoA + Citrate 
2-Ketoglutarate +"NAD + CoA-> Succinyl-CoA + C:02 + NADH 
RCAIM + ATP + Aspartate -> ADP + Pi + PR£ CA:M 
Glycerol + ATP — Glycerol-3-phosphate + ADP 

5'-methylthioadenosine -> Adenosine + Methionine 
Homocysteine + Adenosine < — > s-Adenosyl-homocystine 
Adonosine-3,5-diphosphate + sulfite <t — ^ 3-Phosphoadenylylsuifate 

Fructose-1 ,6-bisphosphate — > Fructose-6-pho sph'3t£: + Pi 
Isocitrate + NADP — ^ C02 + NADPH + 2>KGtoglut.3rate 
Succinyl-CoA + ADP + Pi - — ^ ATP + CoA + Succinate 
Prephenate + NAD -> C02 + NADH + para-Hydroxy phenyl pyruvate 
Histidinol + 3 NAD 3 NADH + Histidine 

AIR + C02 + ATP -> 5-p-Ribosyl-4-carboxy-5-aminoimldazole + ADP 
+ Pi 

GTP ~> D6RP5P + Formate + Ppi 
Ribulose-5-phosphate -> DB4P + Formate 

AHTD -V PPi + Pi + DHP 

OiVAL + METTHF + NADPH + ALA + OTP + 4 ATP CYS 
THF + NADP + AMP + 2 PPi + 2 ADP + C02 + CoA + COP 



MODIFICATIONS BASED ON INFORMATION BY ICARP (1999) 



Fig. 12 



\\ () 02/055*>*>5 



1 1/2<» 



o 
c 



c 

5? 

















H. 


U. 




!ulk/ 










15 


15 


15 














M 


14 










Line 


unc 














A!' 


-Ar 


AP' 


AP^ 


A!'^ 


AP^ 






tiiinA;^ 


UiaiAB 


Ii:i;iAB 


t\iiiiAf3 




nmiAB 


fuin 


funsAB 


luiuAB 




] I 


nulh 


II. dh 


ludli 


mail 




nidh 


ir.clt 


nidh 


mdti 






Mu:C'L) 


;>iuCL) 


■..ucCl) 


siicCi) 




.sue CD 


sue (70 




•.ucC^l) 


0 












:-\V 


rTi: 


kTL' 


tiiih 


lufli 






iaJ:ABrr 


:c:;AiiCD 


sd:,A!}( J) 


sdliABCD 


;.dKAlK";D 


.sdhAliC'L) 


sdhABvlJ 


sdiiABCD 


;.dliAHCi; 










GK' 




UL-' 


GK" 




CiK'' 


uK ' 




ackA 


ackA 


lick A 


ackA 


ackA 


ackA 


ack.A 


ack A 


ackA 


ackA 




d-.kA 


d^jf.A 


dukA 


ti-^kA 




d;.',kA 




d^;kA 


di'kA 


dekA 


ycvHTl' 


rcvHTP 


l',cv[ ITI' 


(;cvkirp 


-cvHTi' 








ilcvHTP 


ycvi n-p 


I'cvHTP 






;;dk~" 


iidk'' 


ndk" 


ndk 




udk' 


nd!;" 


luik"' 


-.idk' 


nJk" 


iidk" 


ndk' 


iidk' 


ndk" 


M(!k" 


11 dk" 


ikU' 


lldk^ 


ndk" 


ndk" 


pii.i 




Pl'-i 




ppa 


ppa 


pp.1 


ppa 


ppa 




pp:i 


ppr> 


pps 


()p: 


op;: 


pps 


pps 


pps 


pps 




pps 


pps 


pta 


I-tJ 


p:a 


pu 


p:.i 


pta 


p:.i 


p:a 


pl.i 


p::i 


pta 


90Vc 


S0"o 




60% 


50% 


40% 


30% 


20% 


10% 


I % 


0.1% 



Percentage of Optimal Bioiiiass Produced 



SaMi: uenc rt*fipt>nsiblc ioi two i:;ttacc;lukir reactions 
'■'^ Same !;cnc responsible lor two mtiaccHular reaction^: 
No nccie ba^ brxu assiiiucd to ibcsc iiitraccl/akir reactions 



Fig. 13 



I2/2() 



P( 171 S02/0(U»(»0 



GHNHS SBLEC TED FOR RHMOVAL BY KN^CKOl VV S lUDY 



Enzymes 


GeriGS 


Reactions 




Mr 


ooAUr — > ArvIK + rl 


Acetate Kinase 


ackA 


AC + ATP — > ACTP + ADP 


OUr Kinase 


nuK 


CUP A I P — > C 1 P + ADP 


CMP Kinase 


ndk 


CMP + ATP -V CDP + ADP 


F0F1-ATPase 


unc 


ADP + Pi + He VI '-> ATP 


Formate THF Ligase 


FTL' 


THF + FORMATE + ATP -> ADP + Pi + FTHF 


Fumarase 


fun^iAB 


FUM ~> MAL 


Glyceraldehyde Kinase 


GK^ 


GLAL + ATP -> ADP + T3P1 


Glycine Cleavage System 


gcvHTP 


GLY + THF + NAD -> METTHF + NADH + C02 + NHS 


Malate Dehydrogenase 


mdh 


MAL + NAD -> NADH + OA 


Methenyl THF Cyclohydroiase 


foio' 


METHF -> FTHF 


Methylene THF Dehydrogenase 


folD^ 


METTHF + NADP -> METHF + NADPH 


NADH Dehydrogenase 1 


ndh 


NADH + Q NAD + QH2 + 4 H^,t 


PEP Synthase 


pps 


PYR + ATP PEP + AMP + Pi 


Phosphatidate Phosphatase 


dgkA 


DGR + Pi -> PA 


Phosphotransacetylase 


pta 


ACTP + COA -> ACCOA + Pi 


Pyrophosphatase 


ppa 


PPi -> 2 Pi 


Succinate Dehydrogenase 


sdhABCD 


SUCC + FAD -V FADH2 + FUM 


Succinate Thiokinase 


sucCD 


SUCCOA + GDP + Pi -> GTP + COA + SUCC 



a,b Saine gene responsible for two intiacellular reactions 
f,g Same gene responsible for two intracellular reactions 
c,d,e No gene has been assigned to these intracellular reactions 



Fig. 14 



wo 02/(»55*>*>5 



13/26 



I*( I7rS(>2/(MK»(>0 



i MODEL SELECTIONS OF ENZVMA J IC REACi IONS llLAl' WILL 


i ENHANCE THE AMINO ACID PRODUCTION CAPABILITIES OF 


i ESCHERICHIA COLl 






1 Amino Acid 


Substrate 


EC# 


Enyzme 



Reaction Catalyzed i 


r\\ ^ 11 II 1 IC 




'"•7 1 on 


G-Phospf-Ofructokinase (pyrophosphate) 


Fructoso-C-P + PPi Fructose-1,6-Bisphosphate + Pi [ 






l-!.7.2.2 


Carbaniale kinase 


ATP -f r.'H + C02 -> ADP + Carbamoyl Phosphate 1 






y o o 


Carbamate kinase 


ATP + r JH 3 + 002 -> ADP + Carbamoyl Phosphate 






L'.7.2.12 


Acetate kinase (pyrophosphate) 


Acetate + PPi -> Pi + Acelyl-Phosphate 


1 Asparaglne 


Glucose/ 
Acetate: 


-:..3.1.4 


Aspartate — ammonia ligase (ADP- 
forminc]) 


ATP + f JH3 + L-Aspar:ate Pi + ADP + L-Asparagine 1 


I Cysteine 


Glucose/ 
Acetate; 


:;.7.7.5 


Sulfate adenylyltransferase (ADP) 


Sulfate ADP Pi + Adenylyl-Sulfate 1 


\ Histidine 


Glucose: 


1.4.1.10 


Glycine dehydrogenase 


NAD + ( jivriine alvoxvlatp + NADH ■+- MH"^ ! 






2.7.1.90 


6-Phosphofructoklnase (pyrophosphate) 


Fructose-6-P + PPi Fructose-1 ,6-Bisphosphate + Pi ' 




Acetate: 


1.4.110 


Glycine dehydrogenase 


NAD + glycine ~> glyoxylate + NADH + NH3 I 






4.1.1.38 


Phosphoenolpyruvate carboxykinase 

(pyrophosphate) 


PPi + O 'taloacetate -> C02 + Pi + PEP 1 


I Isoleucine 


Glucose: 


many 






1 Methionine 


Glucose: 


2.7.7.5 


Sulfate adenylyltransferase (AOP) 


Sulfate + ADP Pi + Adenylyl-Sulfate 1 




Acetate: 


1.4.1.10 


Glycine dehydrogenase 


NAD + glycine -t giyoxylatiB + NADH + NH3 • 






2.7.7.5 


Sulfate adenylyltransferase (ADP) 


Sulfate ^ ADP — Pi + Adenylyl-Sulfate \ 






2.7.9.1 


Pyruvate, phosphate dikinase 


Pyruvate + Pi + ATP AMP + PPi + PEP 1 






4.1.1.33 


Phosphoenolpyruvate carboxykinase 
(pyrophosphate) 


PPi + 0.".aloacetate — C02 + Pi + PEP ; 


Tryptophan 


Glucose: 


2.7.1.90 


6-Pho3phofructokinase (pyrophosphate) 


Fructoso-6-P + Ppi Fructose-1 .6-Bisphosphate Pi 1 






2.7.9.1 


Pyruvate, phosphate dikinase 


Pyruvate + Pi + ATP -> AMP + PPi + PEP • 1 




Acetate- 


2.7.9.1 


Pyruvate, phosphate dikinase 


Pyruvate + Pi + ATP AMP + PPi + PEP ; 






4.1.1.38 


Phosphoenolpyruvate carboxykinase 
(pyrophosphate) 


PPi + Qvaloacetate C02 + Pi + PEP I 





Fig. 15 



wo 02/US5*>«>5 



14/26 



P( r/l;S02/(M)(»(»U 



lO.(K) 



AK(7 



10.00 
-ULUO 



9J7_2 : [ 10.00 
Pi - T AIM 
I'M 61 




icn- 



\N () (>2/055*v>5 



15/2(» 



P( I/IIS02/(HK.60 




wo 02/().S5V9S 



16/26 



r( I/l S02/00(,6( 



A TP J L>- I^KP 

mot) . r io.oo 

ADi> ^ T PVR 

G6P 
I 10.00 




Fig. i5 



l7/2(> 



PC l /l S02/00(»(.0 



GLC 
1 10.00 



lO.Ut) 



F6P 



10.00 



n6P 



10.00 




10.1)0 



liP: ' 



T3P1 19.23 
20.00 



13P2DG 
I 20.00 

3PDGL 
20.00 

2PDGL 
20.00 



20.00 
PEP ^ OA 



0.77 

PYR ^ ACCO'V 



19.23 

.\.SP jr — c--^ ASN 



ATP ADP 
MB Pi 



A KG 



19.23 



GLU 



IVM 



0.77 



0.77 



MAL- 



0.77 



PYR 



0.77 



CIT 



0,77 



ICIT- 



0.77 




SUCCOA 



AIvG 



Fig. 19 



IS/26 



P( l7llS02/006< 



0.B2 




AC " ACTP ICIT 



Fig. 20 



\\() 02/055*>*)5 




100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 1% 



Target % of Maximum Growth Rate 



Fig. 21 



wo (i2/n55*)*)5 



20/2(» 



p( r/rst>2/(Mi(»(»o 



riVOLUTlON Ol^ MINIMAL RliACTlON SETS lOR CASE (1 ) 
l INDIiR DECRivASlNG GROWTH REQLHREMENTS. 



l ai get H ;> 
Maxiiniiin 
(w owth K;i<e 


Mininiai 
Reaction Set 
(U Reactions) 


Key Features 


100% 


234 


The glycolysis, tricarboxylic acid cycle, and pentose phosphate pathways are oil 
operating iM *heir forward dirccticins, optinir:ily oenerating the energy colactors ATP, 
NADH, an:) f J/\DPH required for cell growtti. All available glucose is oxidi/,ed into 
the ocH's only secretec byproduct, carbon dioxide. 


90% 


229 


The fluxes ihrougii two TCA cycle reactions 2-Ketogluiarate ochvorovjenase and 
sue :inate dehydrogenase are zero v/iiile suc^cinyi-CoA syntrietase operates in its 
rev- rse direcition suggesting a less demanding (^nergetis state un^Jer the sub- 
ma dnial growth demands. Acetate is now secreted as a byproduct along wilii 
car: on dicxide. 


80^0 


228 


Fiu.-es thrcijgh Uvo additional TCA cycle reactions, rumarase and n^.alate 
dehydrogenase, are eliminated while a reaction secreting tuniarate is added. 


70% 


226 


The pentose phosphate pathway operates solel/ for nucleotide biosynthesis wiih the 
rea' .tion fluxes through ribuiose phosphate 3-epimerase, transKetolase 1. 
transketolase 11, and transaldoiase B ail operating in reverse. Fluxes through 
g)ui"ose-6-nhosphate dehydrogenase, lactonase, and o-phosphogiuconate 
dehydrogenase arc absent in this case, replaced by pyridine nucleotide 
transhydrogenase which meets the cellular ^JA[:^Pi^ needs. In addition, formate is 
nov.' secreted along with acetate, fumaratc, and carbon dioxide. 


60%, 50%, 
40% 


225 


Acetate is no longer secreted as a metabol.c byproduct, but is converted to acetyl- 
CoA by acetyl-CoA synthetase. 


30%, 20%, 
10%, 1% 


224 


Three glycolytic reactions, phosphoglycerate mutase, enolase, and pyruvate kinase 
are eliminated, but both serine deaminase and phosphoenolpyruvate synthase are 
adced to supply the cell with phosphoenolpyruvate. 



Fig. 22 



wo ti:/(l55')')5 



PCI/liS(t2/«0()(.(l 



META[^01,ITLiS UPTAJvEN OR SEC'RETED AT V.ACU TARGE T GROWTIl 

RATE ON AN OPTINLALLY ENCTNEERED MEDU.EVL 
IJ - DEN'O'ir^S Min^'VriOLEI E i:PTAKi: 
S - DENOTES METABOLITE Sl-CRETION 









Percentage of 100% 


Biomass Generation Required 






Metabolite 


100% 




i 99% 








95% 




_85%_ 


80%_ 








A- e!alo 




i 




' ' 1 




s 


^s 


A- ctaldohvdo 


j 








1 1 






u 


A I :onine 




i 


u 


1 u 


U 


u 


1 u 


u 




u 




A' ienosine 






r- 


1 

1 








u 


u 




Li 


/\l.:in:n'-' 












1 
1 




u 


u 






Ar ■itniri'} 


u 


u 


L u 


u 


i u 


U 


u 


u 


u 


u 


u 


u 


JJ_ 


A.^-parayine 








i 








u 


u 




u 


LJ 


A.^ partnte 






1 ; 








u 


u 


u 


u 


_u 


C;irbon dioxicJe 


s 


s 


! s 


s 


! S 


i " s~ 


s 


s 


s 


s 


s 


s 


s 


Ca steiro 


u 


u 


u 


u 


! U 


u 


u 


u 


1 u 


u 


u 


u 


u 


D-Alanme 








1 ! .1 


u 


( u 


1 — 


u 


u 


Tfiymi(iine 




u 


u 


u 


i u 


u 


u 


u 


u 


u 


u 


u 


u 


Ethano! 


u 


u 


u 


u 


u 


U 


u 


u 


: u 




u 1 


Glycerol 


1 




i 1 








u 






Glyc erol-3-phosphate 


u 


u 


u 


u 


U 


u 


u 


u 


u 


u 




u 


u 


Gljtaniine 






1 




1 


u 


u 


u 


u 


u 


Giutamate 






1 










s 


1 1 

u 


1 t 

u 


Glycine 










u 


u 


u 


u 


u 


u 


u 


u 


Guanine 






; u 


u 


u 


u 


u 


u 








Guanosine 














u 






u 


u 


u 


Histidine 




u 


u 


u 


u 


u 


u 


u 


u 


u 






u 


Isr.ileucme 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


Leucine 














u 


u 


u 


u 


u 


u 


u 


Lysine 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


M»jso-diaminopimelate 




u 


u 


u 


u 


u 




u 


u 


u 


u 


u 


u 


M^rithionine 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


Mannitol 
























u 


u 


Ammonia 


u 


u 


u 




u 


u 


u 


u 












0\ygen 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


PhenyU^lanlne 






u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


Phosphate 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


Proline 










u 


u 


u 


u 


u 


u 


u 


u 


u 


Putrescine 


u 


u 


u 


u 


u 


IJ 


u 


u 


u 


u 


u 


u 


u 


Pyruvate 
















i 


u 


u 


u 


u 


Ribose 
























u 


u 


Serine 
















u 


u 


u 


u 


u 


u 


Spermidine 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


Threonine 




u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


Tryptophan 




u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


Tyrosine 






u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


u 


Uracil 












u 


u 


u 


u 


u 




u 




Uridine 


















u 


u 


u 


Valine 










1 i u 




u 








u 


# Metabolites Uptakcn 


12 






21 


22 


24 


26 


28 


29 


31 


29 


34 





Fig. 23 



wo 02/055995 



22/26 



P( r/l S(I2/(M>6(»U 




Target % of Maximum Growth Rate 



Fig. 24 



\\ (> 02/055*>*>5 



23/26 



IH I7liS02/00660 




Target % of Maximum Growth Rate 



Fig. 25 



wo 02/055'>*)5 



24/2(» 



IH I7rS02/IMK>60 



EVOLUTION OF MINIMAL REACTION SE FS FOR CASE (II) 
UNDER DECRI:ASIN(J GROWTH RFQUIREMEN'I S. 



iMaximiim 

VtiU»>II1 rv»4ll: 


iVliiiiinal 
Reaction Set 

^ fT XVtTalL lltMIS ; 


Kej' Features 


100% 


201 


The organic material transpcrtcd into the coii includos ethanol and glycGroi-3- 
phosphate which fuel glycolysis, the TCA cycle, and PPP. The flux directions of the 
aiycolysis p.ithvvay aie split with all reaction fluxes preceding glyceraldehyde-3- 
phoiiphate (G3P) deliydrogenase operating in reverse, and ali fluxes following and 
incl jding G3P dehydrogenase operate in their forward directions. Putrescinc, 
;.p^:r[7iidinf3, and five amino acids are transporled into the neKvorK oliniinating the 
need for bicsynthetic pathways for these components. 


90^') 


132 


\V|-i,le the f^PP and TCA cycle reactions are still functicnal, thci network no iof^ger 
utilizes the five giycolytic reactions from glycer?Mdehy(;:e-3-phGspliate 
dehydrogenase to pyruvate kinase. Consequently, the TCA cycle is compic'tely 
rjcled by imported ethanol and acetate rather than fiu^ from tne glycolysis pathv;ay. 


80^0 


125 


This netv/c^rk tolerates the complete elimination of the TCA cycle and glyo xylate 
shunt. As a result, the functi-.")n of t'ne pentose phosofiate patf iway reactions is no 
longer restricted to nucleotid*:;- biosynthesis, but now ini:!udes :he forniation of 
cellular rjADPH. Most of this NADPtH is subsequently converteo to NADU by 
piyridine nucleotide transhydrogenase to replace the cellular reducing power lost 
from the inactivity of the TCA cycle. 


70% 


124 


A sligtUly less efficient set of internal motabol-c reactions enables the grov;t[ i 
demands to be rTiet with the .mportation of cf a less metabolite (i.e. one less 
transport reaction) than its 80% counterpart. 


60°o 
50%. 40% 
30%, 20% 


123 


Neitlier the TCA cycle nor PPP are utilized for reducing power. Most of the cellular 
reducing capabilities are now generated from ttie uptake of ethanol and its 

subsequent conversion into acetykCoA. 


10%. 1% 


122 


This minimal net//ork is comprised mostly of celt envelope and membrane lipid 
biosynthetic reactions, along with a number of transport and salvage pathway 
reactions. Here, the three core metabolic routes, glycolysis, the TCA cycle, and the 
pentose phosphate pathway are almost completely dismantled v;ith only one 
glycolytic and 4 PPP reactions remaining. 



Fig. 26 



wo (l2/055'>')5 



25/2<. 



FUNCTIONAL CLASSIFICA riON OF MININ4AL 
NETWORICREACTIONS FX)R GROW rH ON AN OPTIMAFFY 
J^NGINHBRFD MEDIUM. 



Functional Classification |# rxns 


i\ 1 A 1 o o ni o n "7 oi i 
MLM 1 oUI 1 1 1;! iZclllUi 1 


1 


AlternativB Carbon Source 


7 


Anaplerotic Reactions 


1 


Cell Envelope 




Biosynthesis 


29 


EMP Pathway 


5 


Membrane Lipid 




Biosynthesis 


16 


Pentose Phosphate 




Pathway 


4 


Pyrimidine Biosynthesis 


1 


Respiration 


5 


Salvage Pathways 


17 


Transport 


36 




122 



fig. 27 



wo 02/055*>*)5 



2(»/2(> 



P( r/i;su2/o(K»6o 



COMPARISON OF MINIMAL METABOLIC CLNIVREACTION 
SETS BASED ON FLNCTIONAL CLASSIFICATION 





Essential 


INIiiiiinal 


Minimal 


i\ 1 <l 't' <~k rk ^ 1 1 /I l« ■ I at I ^\ *i 

i>iciAinoiiL i uiiciKiii 


ViCUC oCt 


Gene Set 


Reliction 




Ref. (2) 


Rcf. (5) 


Set 




# Genes 


# Genes 


# Reactions 


Amino acid biosynthesis 


0 


0 


1 


Biosyntliesis of cofactors, prosthetic 








groups, and earners 


4 


3 


0 


Cell en\'clope 


o 

1 


11 


29 


Central intermediary metabolism 


7 


7 


1 


Energy metabolism 


31 


32 


21 


Fatty acid and phospholipid 








metabolism 


5 


7 


16 


Purines, pyrimidines, nucleosides. 








and nucleotides 


17 


14 


IS 


Transport and binding proteins 


17 25 


36 




83 


99 


122 



Fig. 28 



